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ON tHE UNBIASED CHARACTER OF EEKELIHOOD-RATIO TESTS 
FOR INDEPENDENCE IN NORMAL SYSTEMS 

By Joseph F. Daly 


1. Introduction. In thi* slatMieal inltirprctallon of oxpGirimeiital data, Iho 
basic assumption is, of courso, that we are dealinK with a sample from a statistical 
population, the oUimentB of which are charaeteriMd by the values of a number of 
random variables x‘, ■ • ■ , x**. But in many eases we are in a position to aasume 
even more, namely, that the population has an elementary probability law 
f{x^, •••,/; 6i, ■ ’ ■ ,6i,)i where the functional form of f{x, 0) is definitely 
^specified, although the parameters di, • • ■ , are to be left free for the moment 
to have values corresimnding to any point of a set 0 in an A-dimensional space. 

Under this assumption, the problem of obtaining from the data further infor¬ 
mation about the hypothetical distribution law/(a;, 6) is considerably simplified. 
For it is then equivalent to that of deriding whether or not the data support the 
hypotliesis that tho population values of the &’s correspond to a point in a certain 
subset u of fl. For example, we may have reason to believe that the population 
Ef has a distribution law of the form 


/(»*, x**; o\ a®, All, Am, An) = 





Here the sot fl is coraposod of all parameter points (o’’, ■ ■ - , An) for which the 
matrix II Ae/ II (i, j 1, 2) is positive defiaito and for which - » < a* < w. 
¥e may wish to decide, on the basis of JV independent observations (xi,, a®#) 
drawn from K, whether An has the vabie aero for the population in question, 
without concerning onrsolvos at all about the values of the remaining param¬ 
eters; in other words, wc may ndsh to tost the, hypothesis H that the parameter 
point corresponding to K lies in flint .subset of U for winch An = 0. One way to 
test this hypothesis is to si’Icct some (measurable) function g{x) whose value can 
bo determined from the data, say 


</(®) 


£ (»‘ - a’)(x® - fi®) 

i (x!, “ «')’ £ (xl - s? 

««>i J L*”i J 


Now g(x) is itself a random variable, so that it has a distribution law of its own 
when its constituent x's are drawn from any particular population K. Suppo.se 
then wc choose a set of values of g(x), say S, such that the probability i.s only 05 
that g{x) will lie in the sot 8 when the x's arc drawn independently from a 
population K for which the above hypothesis H is true. Ordinarily we would 
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take S to be of the form 1 g{x) | > ffo, and the test would then reject H at the .05 
probability level if the computed value of g{x) came out too largo. But for all 
that has been said so far, we are perfectly free to choose a different critical 
region S, and even a different function g{x) The essential elements of this 
type of test are then a critical region S, a function of the data g, and a probability 
level e, such that the probability is e = .05, say, that g CZ S when 11 is true; in 
employing the teat we reject H at the given probability level winuiever tlie 
sample value of g falls in the critical region. 

By the very nature of the problem, any inferences we make from a kample, are 
subject to pos.'iible error. In the kind of test under consideration, the only error 
■we can commit, strictly speaking, is that of rejecting H when it is true (an error of 
Type I in the terminology of Neyman and Bcarson [9]). The risk of such an 
error is thus known in advance; for if we use the test consistently at, say, the .05 
level, we know that the probabOity is 05 that wc shall be led to reject a glvcui 
hypothesis when it is true. On the other hand, it is quite conceivable that the 
test may be even less likely tp reject H when it is false, or more precisely, when 
the true fl’s correspond to a point of which is not in oj. In this event the test is 
said to be biased. Let us make this term more definite by proposing the follow¬ 
ing definitions: 

Definition I. A test is said to he completely unbiased if it has the property 
that for any probability level «(0 < « < 1) the probability of rejecting II is greater 
when the 6's correspond to a point o/ 0 — «than when they correspond to a point of u. 

Definition II. A test is said to be locally unbiased if the set Q contains a 
neighborhood U of oi such that for any probability level «(0 < « < 1) l/tc probability 
of rejecting H is greater when the parameter values correspond to a point of U — w 
than when they correspond to a point of to. 

It is the purpose of this paper to consider the question of bias in connection 
with the Neyman-Pearson. method of likelihood ratios [ 8 ] as applied to the 
testing of what may well be called hypotheses of independence in multivariate 
normal populations. The likelihood ratio method is undoubtedly a very familiar 
one, since the vast majority of tests in present statistical practice arc baaed on 
this method.' But for the sake of completeness wc shall outline it briefly. Let 
the distribution law of the population K be of the form/(x\ • • • , x*; ■ ,di,) 

where the fl's may correspond to any point in a set fi, and let the hypothesis H 
• to be tested be that the 0 ’s actually belong to the subset u of J2. Form the 
likelihood function 

s 

PAx-, 9) = , • • •, a:* i 01, ..., 0^ 

i^e., the elementary probability law of a sample of N elements drawn inde- 
■ pendently from K. Denote by P5 J(x) the maximum of Ptr for fixed x where the 
0 8 are allowed to range overt!, and denote by Pn(x) the corresponding maximum 
value when the 0’s are restricted to u. 'The test criterion is then 
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Evidently X depends only on the observable quantities x'a, and has the range 
0 < X < 1, with a definite probability law depending on that of the basic popula¬ 
tion K. In this method the critical region S is taken to be 0 < X < X<, where 
Xe is so chosen that the probability P{X < X,! is e when the parameters of K 
correspond to a point in u (It may be noted here that in all the cases with 
which we shall have to deal the probability that X lies in S when H is true is 
independent of the particular values of the d’s as long as tliey correspond to a 
point of CO.) The reason for taking the critical region to be of the form 0 < X < 
X, and not, say, x! < X < or X, < X < 1 may become clearer when we examine 
the resulting tests for bias. 

The recent work of Neyman and Pearson [10] has led them to lay considerable 
stress on the importance of unbiased tests. And though their attention has been 
directed mainly to the broader outlines of the theory of testing hypotheses, 
they have stimulated other writers to study particular tests of great practical 
importance, P C Tang [ 11 ] has obtained the general sampling distribution of 
1 — for what we shall call the regression problem with one dependent variate, 

and has given tables for P(X < X,)—essentially proving the unbiased character 
of the test—which should be extremely useful. His article also contains an 
excellent discussion of the manner in which this test is related to the well known 
tests of linear hypotheses [7] and to the ordinary analysis of variance. P, L. 
Hsu [ 6 ] has shown that this same distribution is fundamental in the study of 
Hoteljmg’s generalized T test [5] (a special but important case of what we shall 
call the general regression problem), and has proved that (locally) this test is 
not only unbiased but "moat powerful” in a certain sense, On the other hand, 
it is not true that all likelihood ratio tests are unbiased [2], Consequently, the 
knowledge that in a rather wide class of problems which arise in normal sampling 
theory the method of likelihood ratios furnishes tests which are either locally or 
completely unbiased would seem to be of some value, even when the exact 
sampling distribution of the criterion is too complicated to tabulate. 


2. The regression problem with one dependent variate. Suppose that y is 
known to be normally distributed about a linear function of the fixed variables 
X , ■ ■ ,x^,m that the family of populations under consideration is characterized 
by a distribution function of the form 


( 2 . 1 ). 




Siy \x, b, (r“) = (2ir<r*)~^e. 
where the set of admissible values of o-’ and the 6’s is 

J]:0<cr*<oo, —oo<b<<i». 


Let H be the hypothesis that the point (v^, bi, • ■ • , hr) lies in the subset of O 
defined by 


0): bj+i — bq+i = • • • = b, = 0. 
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The likelihood ratio appropriate to testing the hypothesis II on the basis of N 
{N > r) independent observations drawn from such a population is then 



with the understanding that the values of the fixed variables aia , • • • j Xa asso¬ 
ciated with the a-th observation have been so chosen that the matrix |1 — 

2 I 


a=l 

mum 


is positive definite, (The expression in the numerator is the mini- 

II 

IT / t \!! 

of Z (!/<« - ) fo’’ variations of the b’s over Q, while the denomina- 

fl-i V .-I / 


tor contains the corresponding minimum for variations of the b's over w). 

In order to show that the test is unbiased, we shall make use of the exact 
sampling distribution of the quantity 


I 


= 1 - 


J 


first published by P C. Tang [11], Writing HrijAHfor the inverse of the 
matrix 11 a"* ]| composed of the first q rows and columns of 11 a’'' j), let us put 



Since the critical region 0 < X < X, corresponds to the region 1 — X^^*^ = {, < 
^ < 1, it can then be shown that the probability of rejecting H when the popula¬ 
tion parameters have specified values h^, • ■ , h' is expressed by the series 


( 2 . 2 ) 

..where 


KG, Q = c-" Z ^ 

,=a r! Jf. B[|(r - q) v, - r)] 


B(w, v) 


r(u)r(v) 
r(u + v) 



dz. 


Now G is a positive definite quadratic form in the parameters • • • ,V, m 
that it vanishes if and only if the hypothesis is true. And if 0 < « < 1, then 
^(G, f(),is a monotone increasing function of G. For by differentiating (2.2) 
we obtain 


(2.3) 


km,Q = e- 


id it,\B[Kr-g)-f r + 1, 

^Kr-gl+i-l^ _ ^y(Ar-r)-l 

~ -q)+v, UN 


from a property of incomplete Beta functions, which we shall demonstrate 
in the next section, it follows that each term in the series (2 3) is positive. Ac¬ 
cordingly we have 
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Theorem I 'The likelihood ratio test for the hypothesis that in a population of 
type (2 1) certain of the regression coefficients are zero, i.e., the hypothesis that y is 
independent of the fixed variables , x', is completely unbiased, 

Wilks [15] has noted that the ordinary analysis of variance and covariance 
amounts essentially to testing hypotheses of this nature by means of the function 



Consequently such tests are also completely unbiased, since the region of rejec¬ 
tion is then taken to be of the form f > ft. 

3 An inequality relating to incomplete Beta functions. Let us write 

B(u, V] t) = j' - zy~^dz iO <t< 1). 

Now, 

- zYdz = ^“(1 3“(1 _ zy-^dz. 

U Ji U Ji 

The integrated terra on the right is non-positive, so that 

(3.1) B(u, V + 1; t) < - B(m -f 1, v; t) 

in which the equality holds if and only if i = 0 or < = 1. Again, since 

/(I - zy-' -I- ^“"'(1 - zy = z'^-\i - zy~\ 

we have 

(3.2) B(u -|- 1, V, t) B(m, V + l;i) = B(u, v] t). 

Combining these results, we find that 

(3 3) ^ 

14 

with equality only when i = 0 or t = 1. Hence we have 
Lemma 1; // 0 < t < 1, then 

B(m + 1, v; t) B(u, v; t) 

B(u-i-l, ti) B(u, v) 

4. The multiple correlation coefficient. Suppose the distribution law of the 
underlying population is known to be of the form 

(4.1) f{x\ •••,*' I x‘'^\ 

TT** 

The indices appearing in this expression take the values i, j = 1, ■ ■ ■ , t and 
P, Q = t -h it ■ ■ • , m. The summation convention of repeated indices will be 
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m 

used, for example, S will be denoted by C],z^. Wc shall also have ocoa- 

sion to use indices r, s with the range r, s = 1, ■ •; , m. The set of possible values 
of the a’s, B’s, and C’s is 

12 ; II -B,/ II positive definite; — ® <a’< «>;—«> < Cp <. «>. 


We shall consider the X test for the hypothesis H that a* is independent of the 
remaining variables x'‘, ■ ■ , i.e., that the parameters belong to that subset of 

12 defined by 

w; Bik = 0, {k = 2, •••, i)', Cp = 0. 


Let us write w” = £ (so — x^){Za — x*), and assume that the values of the 

fixed variables xl have been so selected that the matrix 11 u’’® 11 is positive defi¬ 
nite. The likelihood ratio can then be expressed in the form 



= (1 - rT, 


where Vn is the complement of in the determinant | /' | . If W > m + 1 , 
the general sampling distribution of (the multiple correlation coefficient 
between x' and m — 1 other variates), for this case in which x*, • ■ , x* are sub¬ 
ject to sampling variation and the remainder are fixed, is 


(4 2 ) nm-m)] 

xtt - 1) + n + 

^ f=i Mlrirj^iV - 1) -1- M]r[Km - 1) -1- M -I- v] 

where 


1 3 

1 - p = 


IB., 


Bn5' 


,11 > 


hi = 


|B’'|| = ||B«ir. 


This distribution was first obtained by Wilks [13], although Fisher [ 3 ] had 
previously treated the two extreme cases in which ( 1 ) all independent variables 
are subject to sampling fluctuation, and ( 2 ) all independent variables are fixed. 

To simplfiy the presentation, let us put p = p\y = and R = R\ and note 
R- C, = 0 (p = < -h 1, ,. , m) while p = 0 if and only if 

u 0 (fc - 2, ■ • , <), so that y = p = 0 means that the hypothesis 11 is true. 
On any alternative hypothesis, one or the other or both of these quantities will 
be positive. Let the region of rej ection be taken to be 


B, < 5 < 1 , 


which corresponds to 


0 < A < (1 - 
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The probability of rejecting B is then 


(4.4) 


If- V. 5 \ ^ M _',l(iV-l)+;.p’’r[i(iV — 1 ) -f- ft -f ><] 


X 


r 


1 ) m)*-l 


Hi B[Km - 1) H- M + “* w)] 


dR. 


We shall show that 7(p, y, /E.) is a strictly monotone increasing function of ? 
for each y, and that 7(0, y, Jit) is a strictly monotone increasing function of ff. 
dl 

First consider — We can write (4 4) in the form 
dp 


Up, y,R.) =6-2^- 

ji -0 fil 1 


ftirjKiV'-i) + m] 


•E-,(i - p)* 


i 


where 


_ Y\h(N - 1) + u -h p] - 1) 4- p + p, i(N — m); JE, ] 


Vii.ii 

Then, formally, 

p(t (1 - 

op \v-o v! 

.-V-l 




= 2: V(1 - p)‘'"-‘>+'‘ - E - 1) + pW„. 

v -0 v\ vO vl 


Taking out the factor (1 — p) 




, we have left 


i ... - E ^ ... - E [Ki\^ - 1 ) + p]... 


— E (...+1 — [iiB — 1 ) -f M + p]...). 

>»-0 Vl 

AncTthe expression ¥>,,,, 4.1 - [^(W — 1) -|- p -f- v]<p^,, is the same as 

vm - 1) -b p -b. + i]| B[^(j - 1) + P + V + 1 , - m), R l 

I B[i(m — 1) -b p -b >' + 1, i(W — m)] 

_ B[Kw — 1) + p -b p , UN -jr^i^Rt] 
B[i(OT - i) 4 - p + v ,~' i(N - to)]’ 

and is therefore positive, by Lemma 1 . Consequently 


-Hp,S, R.) > 0 , 


with equality holding only if p = 1, or if the critical regidn is taken as the whole 
interval or the null set. 
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We have yet to investigate Vi ^')' 


In this case (4.4) becomes 


(4.5) 


1(0, §, RJ =' 


f I B[i(m - 1 ) + m - m), R,] 

^0 fil B[i(m ~ 1) + fi, - W)] 


(Note that this agrees with (2.2) if we make use of tlic relations r = m, q ~ 1, 
and = 2^^“) We then obtain 

— Jin « P) - — 1) + m + 1 , KN — to); RJ 

_ m); rt.]\ 

BfKTO - 1) + M, - to)] 7 

which the lemma shows to be positive when 0 < i?, < 1 . 

This concludes the proof of 

Theorem II If the underlying population has a distribution law of ihei form 
(41), then the likelihood ratio test for the hypothesis that x' is independent of x^, • ■ , 
x", where x‘'^\ ■ ■ , x”' are fixed and x®, ■ , x' are subject to sampling variation, 
is completely unbiased. 


5 Mutual independence of several sets of random variables.® I.et the dis¬ 
tribution law of the m-variate population be of the form 

(5 1) I I 

IT*” 

Here is the set || B,, || positive definite; — «> < o’ < m Suppose we, wish to 
test the hypothesis/!/that the variates (x®, • , x”"’), . . , ... , 

arc mutually independent in sets [14], where 0 = Too < TOi < • • < m„ ’= w. 
Then the u set is that defined by 

II II = II ^xiii II + +11 II = II .Bi II + ... 7 II Bp II ^ 

that IS, wc have B„ = 0 unless the indices i and j both relate to the same of 
variates 

Associated with the population of random samples Ov (iV > to + 1 ) drawn 
from a universe characterized by (5.1), we have the distribution function 


The maximum of P with respect to variations of the parameters B ,,, o7n 12 is 
summation in accomMc\“ wirthnsuTl mdicates 
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where 


= E ixa - .-S’)(a:L - x’). 




And the maximum when the parameters are restricted to co is 

k JiVm 




J 




where stands for the determinant of the w’s connected with the ^-th set of a;’s. 
Thus the appropriate likelihood-ratio is given by 

^ Vi-..Vp' 

It is easy to see that the value of X/ is unaltered if we replace x' — a' by x\ 
so that we can express the probability that Xj will he between 0 and Xj in the form 


KB, h) = ^ . 


N 

e'""‘ dxl 


dxl 


Furthermore, Xj is invariant under the operation of replacing any x by a linear 
combination of x’a belonging to the. same set. And since the assumption that 
HP,, II is positive definite implies that the matrices 1| have the same 
property, we can transform the x's in each sot among themselves by orthogonal 
transformations in such a way as to reduce each of the expressions 




to sums of squares Thus we have 


N 

TO^t« p - S 

(5.2) 7(P, XJ = e dx\... dx'S = 1(3*, Xj, 


where 

(5.3) 

(5.4) 


/ijl TJ ky 

Blu = 0 


((hn, V , 3ii, ~ w,,-! 1, • • ■, 

io ^ j> , 


and the subscripts on the indices indicate the sets of values over which they 
range; e.g,, runs over the numbers corresponding to the columns of the matrix 
11 P 2 11. From (5 3) and (5 4) it is clear that 11 P,*, j | reduces to a diagonal 
matrix when H is true 

In order to show that the test is locally unbiased, we may consider the deriva¬ 
tives 




ti,9B 


* 

hfkr 



(fi 9^ V, (f t) 
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(^) . 0 , ('-t,?L5!_') _. , 

.h. ,3™«ve , «33 

\ /o iri'"" 4<x. °“^ *"ab, 

And since wheneC/thfpoi^t'if J'l'® diagonal form associated with H, 
^ < X., so also is the point x\ .'.. • IJ ' '' ‘ ‘ is in the ixigion 

' ^' ‘ ■ • > ; • • ■ , a;? it foilowB that 

Cm 5^ p). 


X.) = 0, 


Similar considerations show that fK. 

‘''•“'"= second *riv.tivM 

, 32 


" i.. ‘ % t, (1 4.4') (I .J.,.) r.h,.H^ 


must vanish. 

-p ‘ “Ainss ,3 “ 

(5.5) 

"T^ r~ ~w~ —^, 11 

'“■‘-”nr[i(Af-i,j ‘ 

I 

(Because of the relation «'> h 

w. .e JX + n or t,e 3 . 

G{B, JV - 1, to) 5»<p-i) 

V{B, Ai - 1, to) = 

With the aid of (6.5) we shall n 

■ h““"“'o“P>“oth,mom,nte 

' m’y], 

A - 0 , 1 , ... , 
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for the case in which the matrix |j H has the form 


(5.6) 


Bn 

■ Bimi 

0 

■ • • OBlm 

, 

. 


0 


* Bmirni 

6 

... b 

0 .. 

• 0 



b 

• 


Pll 

B«iO .. 

. 0 




where || 5 || stands for || || -f- • • • + II ii. and all other B’a, except those 

indicated, are zero. Lot us designate by (») the set of v'‘ which correspond to 
the rows and columns of S, and by (v - 6 ) the remaining r's. We then remark 
that the result of integrating (5.5) with respect to the w’s in (v - S) is to reduce 
it to the corresponding distribution for the variables in the set v, thus: ' 


(6.7) 


0(B, J V(B, W - 1 , m) d(v - v) 


— 0(B, N — 1, m — mi)V(S, N ~ l,m — mi), 


where || || is the inverse of the matrix obtained by inverting || 5 ,, ||, and 

striking out the first mi rows and columns, that is 


B — B \ (^1 ^ "h 1, • ' • , m). 


Then, 


G{B, N-I, m) I 



V(B, N - I, m) d{v - 5) 


can be written as 


(5.8) 


XV{B,N-l+2h,m)div- 5) 


Xvi'' Vp''Y{B,N — 1 + 2h, m — mi). 

It can be seen from ( 6 . 6 ) that 


since of all the rows and columns of || || which are involved in || E |1 it is 

only the last in which a non zero element appears outside of the blocks || . 6 * || , 
... , pp i| . Consequently, the v’s corresponding to the determinants oj, 
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Vp are independently distributed, so that if in (5-8) w(' integralc out all flip 
remaining v’s but these, we shall bo left with a product of factors 

G(B, N - I, m) fp 0(Bt , N-l + 2 h, k,) 

G{B,N~l + 2h, m)' ri G[Bi , iV - 1, fci) 

X Q{Bt,N - 1, hi)v7''V{B,, N -\ + 2h, k,) 

+ ^ N-l + 2h, kp), 

G(Bp,N ~ l,kp) 

where fc„ stands for the order of H 5,, |1. And this, when integrated with roapeet 
to the v’s in Dj, ■ ■ ,Vp, yields 

G{B, N-l,m) fp GiBt,N -l + 2h, kd ^ G{Bp, N - I + 2h, kp) 
G{B, N-l + 2h,my fi GiBi, N - 1 , ,N- l,kp)^ ' 


which, because of the definition of the G’s, reduces to 


n 


vm -i) + h] 
vm - 1)] 


■nn 


vm - m 
nm - i) +T] 


xb~"-bS ... 


Denoting the product of ratios of T’s by Kj,, and recalling the form of jj if „ ||, 
we therefore have 


(5.9) 

with 


^r-r-^l = KhS^pB'- 

_V} ... l/pj 


I5'|| = 


Bll ... Bimi 0 . • . OBxm 
0 


Bmil • • 

0 .. 

6 

■BmlO . • 


‘ Bmimi 0 • . . 0 

. 0 


But it IS not difficult to see that under the condition (5.6), the matrix |1 jSp H 
is also the inverse of the matrix obtained by striking out the first m rows and 
columns in the inverse of H B' |[, Making use of this relation, we can apply the 
Jacobi theorem to (5 9), and put that expression in the form 


where 1l Bi || is the matrix in the upper left hand corner of H B' l|, namely 
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Let the subscript /3 on a B stand for the result of replacing -Bim by Bn,^ -t- 
For sufficiently small values of the /3's the matrix (| || will still be 

positive definite, so that wo shall have 


m 

.i(« 


/ 


4 




which we can put in the form 


(5.10) 


X' 


/ 


A 

Vp 






Wilks [13] has shown how to generate moments of determinants by the device 
of replacing by /3nj, -f and integrating with respect to the f's from 
— 00 to o). Applying this process 2h times to the left hand side of (5 10) gives 

/ (jr:bT,)* ^ -1. ”) *. 

which when multiplied by yields 

when the fi’s are set equal to zero. 

To obtain the value of this expression, we may perform the same operations 
on the right hand side of (6 10) But before so doing, we shall put Bp in a 
more convenient form. We have 


Bp = B^p.E~ Bl^.EE""^ Bt^p\ 

where 5 is the inverse element of Bmm in j] 5 ||, and BiJ* is the cofactor of 
Blip m Bip , the result being obtained by expanding Bp according to minors of 
the first row and first column. Similarly, 


(5.11) 

From (5.11) we have 


B = Bi.E-Bl^.EE"”'.B'i^\ 



d 2 nmrn -Oi 


Bj^ 

Bi 


} 


so that if we put iJ. Bi' •. = A, we find that 


Bp = Bip-eIi - bLE'””'.^'.^,.-^ 
I El Bip. 

= A4-|.a-A)f}. 
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Thus the result of multiplying (5 10) through l)y “ (wliere no are 
substituted in this determinant) can be put in the form 


,(5.12) 


/ B Y'"- 
Bj 


1 - 




- 1 (W-U 


B 


ifi. 


Expanding the expression in curled brackets, wc get 




TP rm - 1 ) + p] 

a v\T[h{N - 1)1 ' 





(1 - A)'. 


If we let Bipt stand for the result of replacing Bn by Bn - t in Bib , wo can write 
this as 


I 

(6.13) 


1 i(w-i) V — 1 ) + v] 


(1 - A)'(B(”)“'5}‘^-'’+' 


r[§(Af — 1 ) + A] a' 

nm - 1) +T +">'] at" 


the derivatives being evaluated at t = 0 . 

Now Wilks’ results show that the operation of introducing Pnit + fu^n into 
BiBt to replace §nn and integrating with respect to the f's, when repeated 2h 
times on produces 




fv mN - m 

mN -i) + h] 


when the p’s are finally set equal to zero. Reversing the order of summation, 
differentiation and integration in (5.13), we tlius obtain 


.’"I'l JJ r[^(iV — ^)] r[MA^ ~ 1) d" r] 


(5.14) 


Now 


ni{N -i) + h] 


tTo vir[KW - 1)1 


X (1 - AyiBi^y r[i(iv' -i) + h ] (^ r-kv-dN 

m(Ar-i) + /i + r]Vaf' /o’ 



nm - 1 ) + r] 
nm ~ 1 )] 


(sI'VBr'*'"''*’'*'''', 


so that (6.14) becomes 


ft - i)] 


r[KN -i) + h] 


I J(iV-D 


f r[i(.v - 1) 1 

ririKW - 1)1 


X fi — av — 1) - |- /i] r[KiV — i) + r] 

nUN -i) + h-{- »']'“T[F(iv- 1 )] ' 
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From this it appears that the /i-th moment of is given by 
_ IT TT r[KA^ — i)] 

^ ^ " fi rm - ^)] fi m{N - i) + A] 

nm - 1 ) + r] 

_ U — ■'V 

v -0 


(5.15) 


X Z) (1 - A) 


X 


v\T[^iN - 1)] 

nm -!) + >'] r[KA^ -i) + h] 


r[^(i\r -\) + h + v] nm - 1 )] 


A considerable amount of cancellation will take place in (5.15), for m is greater 
than any kt. Suppose the largest ki is kr . Then we can cancel its product 
into the first one, with the assurance that there will be at least one factor 


(5 16) 


Tim - 1 )] 

nm -i) + h] 


to cancel the corresponding factor under the summation sign. Hence we have 


(5.17) 




TT r[KA’ — ^) + A] 1 ^, r[j(JV — i)] 

.-i-+i miN - 1)] 'L\ v[hiN -i) + h] 


V Y' ('1 _ .y r[|(jV — 1) + r] — l) + r] 

^ ^ ^ r!r[KiV-l)] 'nuN ~ 1) + ;i + r]' 


where n' indicates that t' has been omitted, and n" indicates that one factor 
(5 16) has been cancelled. Then we can take out the factor i = min the first 
product, putting it under the summation sign, where, together with the final 
factor in each term of the sum, it gives rise to the combination 


nm - 1) + v] r[i(iV' - m) + h]V[Um -!) + »'] 
r[KAr - m)]r[Kw - i) + r]' riKiV - 1) + /J + »'] 


After making this reduction, we obtain 


(5.18) 




n vm - *) + A] fr, fv, Tim - oi 
.- 1-^+1 nm-i)] i-i Tim -1-) + h] 


N/ iiOv- 1 ) V fi A V rtKAT — 1) + r] Bl^iN -m) +h, |(to - 1) + v] 

^ ^ rlr[KAr - 1)] Bim - m), m - 1) + V] 


The products of ratios,in the first part of (5.18) are of the type discussed by 
Wilks in connection with integral equations of type B [12], It follows from his 
results that is distributed like the product 


z-Oi 6m> (rn' = m — kf - 1 ), 


where z and the B’a are independently distributed, with the distribution of the 
d’a given by 


f(.ex, 


') = n 


r(c,) 


tJi r(5,)r(c - 6.) 
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whore tho and are constants which depend on N, m, and the sizes of the 
blocks, bnt not on A, and the distribution of z is given by 






Z (1 - A)' 




’[KiV -l) + p] -1 

.in-KAT - 1)] 'BfKiV - - 1) + ' 


Consequently, the probability that X lies between zero and X, is 


^ B[|W -^1)7 Km - 1) + H 


where the integral is to be extended over the region 


S' 0 < s.Qi .. 0,,,- < X"'^ 0 < 0^ < 1, 0 < z < 1. 

Let us integrate first with respect to z and then with respect to the O's; we have 


(5.19) 


•^(A, Xj) — f 


i"-»f;{i-A)-ShA^+-! 

v"0 pir[KA — 1)] 


- m). Km - 1) + p; 
BfKAf “ m), Km — 1) + 7] 


/(O) de, 


where S» is the set 110, < X,''', 0 < 0, < 1, and 


(5.20) 


Ti'iu,v,<fi) = -z)* 

Jo 


02 


- f z" ’(1 — z)"~^dz = B(y, u, 1 ~ ^), 

• I—V 


¥>(0) being l.he upper limit for z for fixed 0. It is clear that the subset of s, for 
which ^(0) < I wdl not be of measiiie zero in the 0-space, since we assume that 

0 <] Xc <i 1. 

The relation between (5 19) and the corresponding expression for tho multiple 
correlation coefficient without fixed vaiiatc.s- -the case ^ = 0 in (4.4) -mav be 
clearer if wc put ' ^ 


(5.21) 


P = 1 - A = 


where r'” is the inverse of in || 5 H ^ and B\^ is the inverse of Bn in 11 /?, 
Then the required probability of rejection when p has any fixed value is 

Up, 1 - \T) = f Z (1 - r[ KAf - 1) + r] 

Jse »=o r! 


r[KA^ -T)r 

B[Km - 1) + P, UN 


m), 1 - 


B[Km — 1) -jl r, K-1^ ~ m] 


7(0) dd, 


wWe we have used the relation (5.20) between the incomplete Beta functions 
Differentiating with respect to p before performing the intention 
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to the d’s, wG find by a computation similar to that in section 4 that cacli term 
in the senes is positive except where tp(6) = 1; so that we have 

^(p,l-xr)>0 (Xe?^l,0). 

dp 


And by (5.21), we then have 


d^I 

dSL 


> 0 . 


Since the argument is clearly independent of which {p 9^ v) wo take, it 
follows that the test is locally unbiased We have therefore proved; 

Theorem III. If x^, ■ ■ , x™ have the joint normal dislnbulion (5.1), then the 
likelihood ratio test for the hypothesis that the x’s are independent in sets is loc,ally 
unbiased 

In certain types of statistical material it may be important to consider, not 
the independence of the a:’s themselves, but of their deviations from regression 
functions For example, in the case of several related time scries, it may be 
desirable to eliminate the trend of each k’ by means of, say, a second degree 
polynomial in i. Consider then in general a population whose distribution func¬ 
tion is of the form 




{p,v = m + I, ■ ■ ■ ,m + q) 


with unknown and CJ Tlio likelihood ratio for testing tlio hypothesis Ui 
that the sets of deviations 

x^ - Cy, ■. • ,i"“ - erx"; ...; , x"" - C^x" 

are independent is 


x, = |Jin 

\di • • • Oj 


where 


- cx„){xi - ox„) 

and Cl IS the usual least squares estimate of Cj,, given by 

Cy = a'" 

with 


a” = 2x«Xa (r, s = 1, ■ ■■ ,m + g). 

An examination of the characteristic function of the d’’' shows that their 
distribution law is the same as that of the v'’^ of the preceding discussion, except 
for the fact that W — 1 is replaced by N — q. Consequently the above results 
on freedom from bias, and also those of the next section, apply equally well to 
the X/ test for the independence of deviations from regression functions 
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6. On th6 moments of Although we have succeeded in proving the un¬ 
biased nature of the preceding test only in the local sense, we can show that the 
moments of the criterion Xj"^ have a property which seems very closely related to 
that of furnishing a completely unbiased test. For it can be shown that each 
of the quantities 

is greater, when Hi is true than when any alternative H' holds. It will perhaps 
be sufficient to prove this statement in detail for the case where /t = 1 and 
wherelf/is the hypothesis that the matrix II II has the form 11 II -I- ||.flia/jll • 


Bn Bit 

0 

0 

Bn Bit 

Bit Bn 


0 

0 


BuBu 


0 

0 

WBiiuW 


in the notation of the preceding section we then have 

ii ,ji = 1,2,; it ,jt = 3,4; i », = 6, • • ■ , m. 

Even when H is not true we find that 

ffi 1 i Bfl ))’^ I* I «'”• r*! = Q(B, N — 1, to) 0(B,N — 1 + 2h, TO — 4) 

' ^ G{B, N-l + 2h,my ' G(5> ~ 1, m -W" ’ 

where B'”' = B‘‘’'. Using the definition of the G'a in section 6 and the Jacobi 
theorem, we can write (6.1) in the form 

il[l c'M* I'a’*" n = 

where 5 is the determinant of the matrix composed of the first four rows and 
columns of |1 5.^ |1 . In the general case we therefore have 


Bn 

Bit 

Bn 

B» 

Bn 

Ba 

J5m 

Bti 

Bn 

Ba 

Bit 

Bu 

Bn 

Bft 

Ba 

Ba 


. 1, .epUco B .„d B, by B.,„ + ffffj;' + 

_ ijij + fii cij i" t,, respectively, indicatmg this replacement bv a 
prime, we obtain 




( 6 . 2 ) 
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Treating 5' as a bordered determinant, we can reduce it to 

B' = 5(123) (1 + 

= 5(i«(l + + Billh^lUu) 

= 5„(1 + )(1 + B'd^h^iX’) 

= 5(1 + -H BUXXXl + BdXX!)(^ + BinXXh 


where the subscripts on the B’s indicate the sets of f’s still contained in the 
determinants, and 115‘'|| = |1B„•11~^ Similarly, 

(6.4) 5' 


= .8(1 + 

the inverse now being taken with respect to 1| .8 ||. 

But between, say, 5(12) and 5(12’, there is the relation 


/ft S*a/2 _ D*232 

-^^(12) — -0(12) — -0(12)13(12)ij31^(12) , 

where |1 5(jj),,„ |1 = || B)*^) ir\ that is, the inverse of the matrix obtained by 
deleting the first four rows and columns of j] Blh) 1|. Consequently 

< BdXX’ 


with equality holding only for those values of the f’s for which 

5’(iJ5 = 0 fa = 6, .. . , m. 


And this set of f’s will not make up the entire f space unless 1] B,-, j] = 
II ^ II + II -Sla 33 II- Applying the same kind of reasoning to the other quad¬ 
ratic forms in (6 4), we can therefore show that 


< J(i + ... (1 -b di 


The last form can be reduced to a sum of squares with unit coefficients by a 
linear transformation of the thus 

rj3l(v-i)5,-i(v-i)^M^j 

( 6 . 6 ) 

< 5 "' / I r‘(l + ... (1 -b d£. 

And hy making use of the fact that 

5(128) = 5p28) • 1 5(12) iii, I, 
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we can express the right-hand side of (6.6) as 

This in turn becomes [c, f (6.4)] 

5-' /1 5 ( 1 , r‘(i d- B"- d- 5jij‘ f 

X {i+S\ilUiUhP''0- + dk 

= / 1 5(i,„n r'd + 5‘"‘f.'”f,?)“*^(l + Bell” 

X (1 + sslf + sf If j'f)-»"+” di 


At this stage'we can write 

I 5 (i,nn I = I 5 (,n 1(1 + B*’"‘|.‘fe,'f)(l + 

where |l5*iY“ |1 = IP^fand apply the relation 

BVr = Buy - BaYBa,(„,BlY*, II = II BUY ir. 

Therefore, , 

unless ?ifBUY = 0 (fa = 3,’4). We can thus continue as follows 

< is.„. r/'(I f<f) 

X (1 -f- ?lf)-‘'"(i + df. 

Transforming the_ I®’s, we get 

| 5(.„ r' / |5aV” rHl -I- ^™)T*‘''"''’(1 + 2?'f f|f)-‘''^+» 

Since 1 5(iY" | ^ = 15(1),,;, |, this becomes 
1 5,1,1 r / (1 + f'Y)-‘''(l -]- 2j(« jW)-i(iv+n 

■ X (1 -b 2tif f'f)"‘^(l + 2iif ijf )-«^+» di 

= I (H-2i'f5,)-*''(!+2€jfsU>)-‘<''''-» 

( 

di. 


, X (1 + 2{Jf J'f)-‘'^(1 + 
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Collecting these results, we finally obtain 

<Kif(i + Zf'f + s iif 

with equality only in case Hj is true. But the right side of (6.7) is the first 
moment of computed under the hypothesis Hr, while the left side gives 
the corresponding moment in the general case 
The possibility of carrying out this reduction for the case in winch the matrix 
11 B 11 has more than two blocks, or blocks of unequal size, seems sufficiently 
clear. And to obtain higher moments, we have only to introduce the proper 
number of f’s into each set. We then have; 

Theorem Ilia. Let Xi be the likehhood ratio appropriate to testing the hypothesis 
Hi that the normally distributed variates • , x" fall into the mutually inde¬ 
pendent sets x^, 2'hen the expected value of 

h = -j, 1, 1^, • • , IS greater under the null hypothesis Hi than under any 
alternative hypothesis in 0. 


7. The general regression problem. Let the vaiiaLes x\ 
tributed according to the law 


, x' Iki dis- 


(7.1) 


B.i I* 


Throughout this section, let the ranges of the indices be 

h 7 = 1) ■ ,i p, q t + 1, ■■ ■ ,m 

?•, s = 1, . • ,m r', s' = 1, ■ ■ ■ ,t -Ir q 

fi, V = t 1, ,t + g <r, T = t q -{■ 1, ■ ■ ■ , m. 

In (7,1) we therefore have t random variates, and m — t fixed variates. Con¬ 
sider the hypothesis H that the x" are independent of the last set of x’s, namely 
x". We have 

Q: II Bij II positive definite, — oo < (7), '< m, 
while for co we impose the additional requirement 

C\ = 0. 

Thus in general we have for the distribution of random samples Oat , iV > m. 


( 7 . 2 ) 


—e 
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while when H is true, we have 


(7.3) 


P = 




rim 


Differentiating (7.2) with respect to the 5's and C"s and setting the derivatives 
equal to zero gives us the conditions 

( 7 . 4 ) ^ CpXlXa XaXa, 

a—1 

(7.5) = - CixV). 

iV 1 


As in section 2, we put 

N 

a" = Za:Ul. 

1 

and assume that the fixed values have been so chosen that jj a”* 1| is positive 
definite. Then (7.4) and (7.5) can be combined to give 


v 

^ ~ ]V ^ ^ a , 

where || a„ IT' 

= II a”’ ||. It'then follows that 


Similarly, 


where 

ai’= \\al,\\-^ = Ho"' 


The matrix || a |' will be positive definite except for a set of probability zero, 
so that we can consider,' a‘', | as the inverse of the matrix obtained by removing 
the last m - t rows and columns of the inverse of || a^‘ |1, and |1 || as the 

inverse of the matrix obtained by removing the last q rows and columns of 
II 0 ^ * II *. Then by the Jacobi theorem 






so that the appropriate likelihood ratio is given by 
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(7.6) 


It will be advantageous to complete the matrix || 5,,' |1 in (7.1) by defining 

D _ _ R 

■Ij ip 1 y O p j 

£p, = C'^B„C{. 

(Evidently = 0 for i = 1, • ■ ■ , i and fixed p, if and only if Cp = 0, 
;■ = 1, ■■,<). We can now write (7,2) as 

(7 7) P{x B) = e" 

We next notice that X is invariant under the transformations 


x' a:" /3,x’ 


so that if we put 


7(5, X,) = J P(a;, B) da;J . • • dailr, 


where the integral is extended over the region 

S: 0 < X < X., 

it turns out that 

1{B, X.) ^ I{B*, X.), 

provided 

B,* = alBaa], B*p = OiB/cii, B*, = OiBkrPl. 

To prove the locally unbiased character of the test, we may therefore consider 
the derivatives 


O-D-id oJJiffOn^T 

and assume that 1| B* || and 1| a'"’ 1| are in diagonal form We also observe 
that X is unaltered by the transformation 


a:’ -> a:* -|- B’*Bi,a;''. 


We therefore have 


= dx. 


d 

dBl 


mt, X,) 


-2 


Boo r 

^iift 


I. 


N 

S x’‘aX’ae 

'3 a"»l 


“ ‘ dx, 


Thus, 
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which IS easily seen to be zero. Again, consider a noii-hepeated second partial 
derivative, say 



This plainly vanishes iik 7 ^ I, but it is by uo means easy to see what happens 
when h = I, even vhen a 7 ^ r. Let us therefore study the distribution law of 
for the case, 

= 0, i 1. 

(We shall not, however, assume that the transformation B B* has been 
made on the B’s ) 

Define 

= Bj^ ~ Bj,,B'’B,^ , 

r = a" - 

where ||a,„|| now stands for the inverse of l|o'"'|| Tlie.se expres.sumH will 
arise when we adapt Wilks' method of moment generating oiieratons [13], based 
on the identity 


(7,8) / dx\... dxf, exp (-5,, a”’) 


to the problf'in We shall understand from now on tliat B = | j and 

l.fl ,, - 11/1, 11 , Let us rearrange the form in the exponential on the 
right, thiKS, 

= (5pa^' + 2B^y’ + 


- B,iB'’B„a°\,a'‘') _ B,iB'’Bi,d" 

= Q - B„B'^Bi,a’^ 

= Q-B'%,. 


's', ana a 


A subscript ^ Will denote the result of replacing Bf,., by B,,,, + B,>, 

MW thT! ^u" ' <^otisid«r 

inents ha™ ten ZS 
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Let us integrate first with respect to the . Wilks has shown how to write 
Q'^ in the form 

Q'^ = —Qi? + oT^iiiv — , 

Ofi 

where 

Qv = Biy^a!'" + 2Bpi»B'^'’B,.ar^ + B^^B'i’B,.ar'^a,,a\ 


This latter expression is thus free of the . Consequently, 






where 




which can be written 

The method of reduction used by ,Wilks can now be applied to Qip and Qip, 
and gives 

q'^ + = Bp,^ByB„^a'‘'’ + 2Bp.pByB,,a^^ + 

an expression which does not involve the f’s. Thus 

(7 10) J e"“^ = TT*® I a"" e"®" ■ B^^, 


Now the quantity 




B 


where B'' stands for the cofactor of B^, in || B,, |i, can bo expressed in terms 
of B^, provided we use our assumption that B^, = 0, i 9 ^ 1, whereupon 
reduces to the single term yB^^. In fact, we have 

Blgy a” f] = Z ri V'(iV - m + < + 1 - z, 2 h) | a”" 

t-l 

X exp (-Bp,pa^^) = 2 £-Ii(v-j)+/.+H 

i-i p-o v! 


(7.11) 
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vifbere. following the notation used by Wilks [13]j 

K = exp 


'P(a, b) 


r[i(a + 6)1 
r[ia] 


And (7-1^) written as 


(7.13) 


I"! 


rlKAi -g)+h] d* 
^0 y\TIKN - g) + A + p] a«'' 


^ V2/' - qj -rn ,] d /„-(1(a?-«)+*)^ „ 

X I ^^rTTvf^ 1. ..T ii*.*' ' )u*J3f 


here stands for the result of replacing Bu by Bn — u. Changing Pr',’ into 
T 4 - integrating, we then find that by virtue of (7.10) 

Pr’t' n ’ 


Mg^.|a"na''" r‘] = nil-lo'”r|a'"r‘Bfl‘'e-«' 

_-\ t ' ny ’ riM^r ~ 9) + 6 ] 0' f 

^ ^ M 7! r[J(W - qf+h+p] 317' J 




14^ 


Now 

I n^(N - 5 + 26 + 1 ~ f, -1), 


BO that (7.13) becomes 

E[gi> 1 a" r W"' l"^] “ n - m + < H- 1 - f, 26) 
(7,14), . Xn^(iV-g + 26 + l-f,-l)la«r|a'"r‘ 


V '^y' - g) + 6] a' 

^ ^ ,4i!71r[i(N_g) + 6 + .]al7'^'^'’“ 

Comparing (7.14) with (7.12), and making use of the fact that 


4-(o, -1)4'(1 - 1, -1) ... - 26 + 1, -1) = ^(a, -26), 

thus have 

1*1“" * I = .Sir*"' n 'Pi'N - OT + < + 1 - i, 26) 


t 

X n i('(iV - g + 26 + 1 - i, -26)1 o” I* I a"' 


vvy' r[KAr-g) + 6] „ 

mN -q) + h + y]^' • 
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Setting the ^’s equal to zero, performing the differentiation, and recalling the 
definitions of K and , we then find 

= n ^(iv - m -f f + 1 - t, 2fi) ri V'(JV - g + 2/i + 1 - f, -2/i) 

(7.16) 

^ -«i.u f. r[|(f\f - g) + A] m(N -q)+p] 

i=i v\ T[iiN ~q) + h+v] nm - 5)] ■ 
Taking the first factor from each product, we can convert (7.15) into 

t I 

n iA(Af - wi + < + 1 - i, 2A) n f (fV - g + 2A + 1 - i, -2A) 

i-a t-2 

^ -.bu V (l/B“)' -m + t) + h] Tim - 3 ) + v] 

v\ nm-m + t)] 'rfKlV-g)+ A + r]- 

This last product of ratios of r's is equivalent to 

_ r[KA'' — g) + v] _ r[|(m -t — q)+ i']r[|(JV — m + t) + h] 

r[K-A^ — m + 0]r[§(w -t- q) + v] r[K^ — g) + a + r] 

Thus the moments of are connected with an integral equation of type B 
[12] and is distributed like the product 

e-di-- - e, 0 < 2 < 1, 0 < e, < 1, 

where the joint distribution of the d’s is 

m 

_ TT _ r[|(A^ — g + 1 — f)] _ ni(Ar-m+(+W)-l/., _ 

t\Tm-m + t+l-^)]T[i{m-t-q)] ' ^ 

and 2 is distributed independently of the 6’s with the distribution 

^-0 v! HIKN — m + i), i(w — f — g) + v]' 

The probability that 0 < X < X, is therefore 


Hy, X.) = j f(d)F(z) dzd^t ■ • ■ det, 


where S is the region 0 < fla • • • « < X^^^. Putting ip(fl) for the upper limit 

of 2 in fil for fixed 9, and St for the projection of S into the 6 space, we then have 

f f ^ CP JW-m+O-l/’.i l+f *\ 

i(y,M) = / /(9)]c-''^“ Z ^ -4,- ^rr -■ y — 

Ja, [ »-o v\ Jo B[f(iV — m + ^), f (to-T- f — g) + v] j 

If we replace 2 by (1 — 2 ) we then find 


iiy, Xo) = f m 


-van (yB^^yBlijm — t — q) + v, m - to + t); 1 — 


»-o vl B[i(TO — t — g) + V, ^{N — TO + 


l-Ol J 
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As far as y is concerned, (7.17) is essentially the same as (2.8)._ The eoruputa- 
tion which was made there, together with the type of reasoning (“inployi d in 
the latter part of section 5 in connection with the indepondenee test for sm-eral ‘ 
blocks, then shows that 

— (0 < 6 < 1 ). 

dy 

Remembering that 

y = , 


we see that 



—= 2o", 


and we remark that the assumed positive definiteness of H 
||fi"||. Hence the relation 


a"® II implies that of 



together with the fact that we could have obtained tlie analogue of (7.17) 
under the assumption 

fii, = 0 t 5^ to I 


where to is any fixed number in the set 1, • • • , t, shows that tht' matrix of 
second partial derivatives is positive definite when H is true. 

Thus we have 

Theorem IV. Let • • • , *' he Twrmally distributed about means which are 
linear functions of certain fixed variates . • , i”. Then the likelihood ratio 
test for the hypothesis that the distribution of , x‘ depends only on a selected 
subset • ■ , x*'*"' of the fixed vanates is locally unbiased. 

The result of this section has its most immediate application to those problems 
in the analysis of variance which require simultaneous conhideratiori of several 
interrelated dependent variables ■ • , x* in conjunction with a given set of 
independent variables , x*" [15], For the usual hypothesis to be tested 

in this case is that x\ . ■ , x‘ are jointly independent of, say, x‘’''®‘*'\ ■ •. , x”. 

To return to the general case of (7.1), the method of this section can also be 
used to test the hjrpothesis that the regression coefficients referring to the x® 
have particular values, say 

^ = 1, .. . , JT = < -f g -f- , TM, 

the remaining C’a and the B’s being left unspecified Since we have 

- Cy - C;x' = X’' - ~ (Ct - Cjo)x® ^ CU, 
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by the device of replacing x'a by x'a — Clox’^ , we can reduce this problem to 
that of testing the hypothesis that 

cl' = ci - CU = 0 . 


Similarly, the problem of testing whether the linear functions ul = alCl have 
specified values ulo comes under the same heading [7], 

A particularly interesting case of the general regression problem is that in 
which m = < -f- O' + 1, so that the null hypothesis H states that the chance 
variables x' are independent of the fixed variate s"*, though they may depend 
upon In this case we are able to find the exact distribution 

law of without assuming that any of the regression coefficients C' are zero 
For the quantity 


(7.18) 




v=0 


(y.E'r 

r' 


[§ (N—g) +A+>'] 


which would have occurred in (7,11) had it not been for the restriction 5,, = 0 
(t 7^ 1), can now be expressed in terms of even without this restriction. 
By definition 

and the vanishing of the Bm is equivalent to the vanishing of the regression 
coefficients C'm associated with a:". And since 

I 5., - I = B - , 

we can write (7.18) in the form 

f 1 r m -q)+h] £ 

V ! ni(N -q) + h+v]du> ^ ’ 

where 

||Bfl„|l = II 

is positive definite provided u is sufficiently small Thus the moments of 
can be found from (7.15) if we put a”""B'’Br^iB^, = y,^B'^ in place of yB^\ 
Moreover, it can be seen that when the value m = t + q 1 ia substituted 
into (7.15), that expression reduces to 


V' iy'jE 0" B[^(J\f — m -H 1) ft, — g — 1) + r] 
I ~B [J(iV — OT -f 1), " g - 1) + r] 


>.“0 


so that X ' is distributed like w, where 

■ (^1 RO\i' „„4(x-m+l)—1^ NiCm-o-D-l+i. 

(7 19) fiw) = E 


K -0 v! B [^{N — m -t- 1), i(m - g - 1) + r]' 
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The distribution law of for this case is thus closely related to tliat obtained 
in. the treatment of the regression problem with one dependent variate in 
section' 2. Applying the argument used there, we can obtain: 

Theoeem IVa. The likelihood ratio test for the hypothesis lhal in a population 
of the type (7,1) the variates x* are independent of x”—the case m =* t -b ? + 1 
of Theorem IV—is completely unbiased. 

' If we specialize the problem somewhat further, considering tlie ease, 5 = 0 , 
a:” = 1 (so that m = f + 1), we find that the likelihood ratio takes the form 

xJ/JV _ 1 _ 1 

If 

where ^ {x'a — x')(xi ~ x\ and T is Hotelling’s generalization 151 

of Student’s ratio. In this case we are testing the hypothesis that tlie are 
distributed with zero means. The exact distribution law of 



was recently published by P. L. Hsu [6], who obtained it in a very elegant 
fashion by means of the Laplace transform. He has also shown that the re¬ 
sulting test is most powerful in the sense that, of all critical regions iS for which 

P{x C S) = « + hB%h, + R(h) 

(where « and o are independent of the and of the means hi , and R is an 

infimtesimal of at least the third order as all h tend to zero), the critical region 
defined by 


has the largest possible value of a. Tang’s tables [11] make it evident that 
this largest possible value of a is actually positive and that the tost is in fact 
unbiased for all values of the 6’s when « = .05 or e = .01. The results of this 

thZ ^ it 

of Hotelling’s T is by no means confined to the above case. 

bv I Studentized D\ 'devised 

by Mahalanobis for measuring the "distance” between two normal multU 

brrcXfrd ’n ? HoteHing-s T. This fact is pointed out 

for the case in which obtained the exact distribution of H’ 

am assured totavc Z T" ^^ch the samples are drawn 

allowed to have different s t^^ variances and covariances, but are 

of Hsu’s They also note ^ independent 
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8, Summary. The method of likelihood-ratios is of practical as well as theo¬ 
retical importance, because it provides a unified approach to the problem of 
testing statistical hypotheses. In this paper we have investigated many of the 
tests which this method yields when applied to hypotheses about sets of re¬ 
gression coefficients and covariances in normal populations. By stud 3 ring the 
probability functions of the corresponding X-criteria we are able to show that 
these tests are ''good,” in the sense that they are unbiased even for spiall samples. 

Among the completely unbiased testa which can be based on the likelihood- 
ratio method, our discussion includes, the multiple correlation coefficient, with 
or without fixed variates [13], Hotelling’s generalized T test [6] and the sta¬ 
tistically equivalent “Studentized [1]; the ordinary analysis of variance 
and covariance for orthogonal or non-orthogonal data [11, 16], as well as related 
tests of linear hypotheses in the case of one chance variable. 

With respect to the analysis of variance for two or more variables [15] and 
certain other hypotheses regarding regression coefficients in multivariate popu¬ 
lations, though there are indications that the tests are completely unbiased, we 
haiie succeeded in demonstrating this property only in the local sense. 

Finally, the likelihood-ratio test for the hypothesis that the variates fall into 
certain specified mutually independent seta [14] is shown to be unbiased, at 
least locally, and has the additional property described in Theorem Ilia. 

In conclusion, much more than a word of acknowledgment is due to Professor 
S. S. Wilks of Princeton University, to whom the writer is greatly indebted for 
advice and encouragement. 
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INTRODXJCTION 

An important portion of algebraic invariant theory has been that devoted to a 
certain class of invariants called seminvariants, semi-invariants, or more rarely, 
half-invariants. Of these terms, “seminvariant” seems to bo the one now 
commonly accepted. The same three terms have been applied at various times 
and by various writers to a system of moment functions of importance in sta- 
tistical theory. The statistician using these terms has frequently done so with 
an apology for appropriating a term of the algebraist. As a portion of this 
paper we shall show that the moment functions of this system arc actually 
algebraic seminvariants, and that there are other systems of moment functions 
which are equally entitled to the name seminvariant, 

33 






34 


PAUt L. DHESSEli 


The study of the statistical seminvariants of a population leads naturally to 
consideration of the problem of obtaining from a sample unbiased estimates of 
the value of these seminvariants Estimates of this kind have been defined 
and computed by previous authors, but no simple method of obtaining the 
estimates has been given. In this paper a simple procedure for calculation is 
given and it is furthermore demonstrated that these estimates form an important 
phase of statistical semmvariant theory. 

The system of notation used for moment functions is that of R, A. Fisher, 
although the actual letters used in representing particular moment functions are 
not altogether the same as those used by Fisher. In general, a moment function 
of the population has been indicated by a Greek letter, the corresponding sample 
moment function by the corresponding English letter and the estimate by the 
corresponding capital English letter. 

A list of references appears at the end of the paper. Each reference has been 
assigned a number and this number placed in square brackets is used in the body 
of the paper to indicate the reference. Pages of the reference are indicated by 
additional numbers inserted in the parentheses and separated from the reference 
number by a semicolon. 


I. THE BELATION OP THE ALGBBHAIC SEMINVAHIANT THEORY TO THE MOMENT 
FUNCTIONS OP STATISTICS 

The purposes of this chapter are: (1) to review briefly and give adequate 
references to certain important phases of algebraic seminvariant theory, (2) to 
®:PPiy this material to the moment functions of statistics. 

1. Definitions. Any function of the coefficients of the binary form 


( 1 ) 


i-O 


/= ]ErUx"-r', 


Oo 0, 


which is invariant under the transformation 

(2) X = 4- 7 = 4 


A = 


Ti Ts 
ffl 52 


5 ^ 0 , 


is called an invariant of the form /. See Dickson [1; 31-36]. 

coefficients of / which is invariant under the trans- 

^ + 7n, r = 7,, 

is called a seminvariant of /. 

The two operators 


(4; 




•n»Bto «.■!, mdeed, Md »minyariant. nay ba datoad by naan. 
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of these operators. A necessary and sufficient condition that an homogeneous 
isobaric function of the coefficients of / be an invariant is that it be annihilated 
by both fi and O. See Elliott [2; 113, 124], The pecessary and sufficient 
condition that an homogeneous isobaiic function of the coefficients of / be a 
seminvariant is that it be annihilated by fl See Elliott [2; 127]. 

It should be noted that there is nothing in the definitions above which requires 
that invariants or senunvariants be integral, although usually only this type is 
discussed. In what follows we shall find it more profitable to di,scus3 homoge¬ 
neous isobaric fractional seminvariants, the fractional quality resulting from 
the appearance of oo in the denominator. 


2. Complete Systems of Seminvariants. By direct application of the trans¬ 
formation (3) to / the system of seminvariants [1; 47] 


(5) 



r < n, 


is obtained. This system is a complete system, [2; 44, 205, 206], in the sense 
that all other seminvariants fractional in ao and of degree 0 are cxpiessible 
rationally and integrally in terms of this system. 

Other such systems can be defined The system of minimum degree semin¬ 
variants, the seminvariants of even weight being of degree 2 and those of odd 
weight being of degree 3, has played an important role in the algebraic seminvari¬ 
ant theory. Elliott [2, 207-209] discusses this system and gives the general 
formula for the even weight seminvariants of the system. So far as the present 
writer has been able to discover the general formula for the odd weight semin¬ 
variants has never been published, although Hammond [3] may have obtained it. 
After some lengthy but not difficult computation the result has been obtained, 
so that the last mentioned system of seminvariants is completely defined by 


r _ 1 V -IV o.Oar-. 

(6) = Z (-1)’+^ (■^!' ) ^ ar^ar+.+X 

.-0 \i + rj I 


+ r + l 


2 

do 


■0 _|_ y ^ ^I I 

^0 \ij oq ’ 

It is easily demonstrated that for each of the above seminvariants, and in 
fact for any seminvariant, the sum of the numerical coefficients is zero. Dickson 
[1; 55] gives a suggestion leading to a very simple proof. 


3 The MacMahon Non-Unitary Symmetric Function Principle. Denoting 
the roots of Z u.X" ’ = 0 by m , aj , . , the r-th power sum of 

these roots is defined by 

(7) s.= Z«;. 
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The form f may be written n (X - c^>Y). 

By a result due to MacMahon [4; 131] the seminvariants of the form / are 
identical, except for numerical factors, with those symmetric functions of the 
roots of 


( 8 ) 


^ = 2 j;r‘- = o 

1-0 'll 


which when expressed in terms of sums of powers of these roots do not con¬ 
tain Si. MacMahon called such symmetric functions “non-unitary.” 

As a result of this theorem, MacMahon was able to discuss the seminvariants 
of a binary form of infinite order by discussing the non-unitary symmetric 

ao 

functions of the roots of 2 -i’ T* = 0. 

<-o 


' 4 A Third Complete System of Seminvariants. By application of the result 
stated in the previous section, a third complete system of seminvariants can be 
immediately obtained. Obviously the power sums Sr, r > 1, are independent 
of Si. By the Waring formula, Burnside and Panton [5; 91-92], if 

ZciP* = con(l-aiF) 


(9) 

wherein 


Then for 


ITiItj! ♦ • • ITnl \Co/ \Co/ 









( 10 ) 


(- 1 )-Vi(p - 1 ) 1 f “‘V‘M'*.. 

-(r-l)ls, = 2_ \go/ Voo/ VOfl/ 

TTilxri •.. irj (2!)'» ... (nl)'" 


Placing Br (r - 1) Is^ the B’s form a complete system of seminvariants. 
Ihis result has some mteresting statistical connections which will be men¬ 
tioned later. 


5 Lmearly Independent Seminvariants. It follows from the MacMahon non- 
umtary symmetric function principle, or it can be proved easily in other ways, 
that the number of linearly independent seminvariants of a given weight r is 
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equal to the number of partitions of r which contain no unit part Furthermore 
we have at our disposal a simple method for obtaining a set of linearly inde¬ 
pendent seminvariants of any given weight 

For many purposes the power product defined by Dwyer [6; 13] is more 
useful than the customary monomial symmetric function. The power product 
is defined by the right hand member and indicated by the left hand member of 

(11) (gi Qr) = Z) «?]«?] 

where, for convenience, $i > > • • > Qr ■ The monomial symmetric func¬ 

tion which will be denoted by M{qi < • Qr) is related to the power product by 
the identity 

(12) Ti! . r,\M{qPqP ■ • ■ g,") = {ql'qP ■ ■ ■ qJ‘), 

so that a distinction occurs only when there are repeated exponents in the 
summation of (11) 

If we desire a system of linearly independent seminvariants of weight 6, by 
the MacMahon principle we need only to compute the values of the power 
products (6), (42), (33), (222) in terms of the a’s In a somewhat different 
form these will be presented later. 

6. The Roberts Theorem. Roberts, see [2; 231] aoid [5; 108], demonstrated 
the existence of a duality relationship between power sums, s's, and coefficients, 
a’s such that corresponding to any seminvariant in terms of a’s there exists 
a seminvariant in terms of s’s obtained by replacing o, by s.. The proof con¬ 
sists of showing that the annihilator for seminvariants in terms of power sums 
is identical in form with fl, a, being replaced by s,. 

As a result of this duality, each of the systems of semmvariants which have 
been obtained yields, upon replacement of o, by s,, another system of semin- 
variants. In particular cases it may happen that the systems are identical 
when the identities connecting the a, and s, are taken into consideration. 

We next wish to show that the systems of power sum seminvariants thus 
obtained either are identical with certain well known statistical moment func¬ 
tions or lead to new ones. 

7. Statistical Distributions Represented by Binary Forms. The fact that 

statistical distributions may be represented by polynomials has long been 
recognized by statisticians, see Thiele [7; 24-26] and Bertilsen [8], Indeed it 
was this fact which led Thiele to the definition of the seminvariants now called 
by his name. If we have given n observations ai, aj , form the poly¬ 

nomial. 

f?” = f[(X-a,) = 

♦■■1 \^/ CIq 


(13) 
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f IS not a binary form, but the seimnvariant theory of binary forms is applicable 
since seminvariants are functions of the differences of the roots and are inde¬ 
pendent of the X and Y, which appear merely as convenient symbols to indicate 
the various terms of the algebraic form. 

For distributions containing an infinite number of items the form F is of 
infinite order, but discussion of its seminvariants may be carried on by use of 
the MacMahon principle given in section 3. 


8. Three Systems of Statistical Seminvariants. Before exhibiting some sys¬ 
tems of statistical seminvariants it may be well to consider the meaning of 
statistical seminvariant,” for this phrase has been undefined. In fact the use 
of the phrase is merely a matter of convenience in that it emphasizes the fact 
that seminvariant moment functions have not previously been regarded as 
algebraic seminvariants. As used here a statistical seminvariant is an algebraic 
seimnvariant which has some application in statistical theory. 

The system of seminvariants (5) yields by application of the Roberts’ Theorem 
the well known system of statistical seminvariants usually called central mo¬ 
menta. If /ij = the general formula may bo written 


The system of seminvariants (6) likewise leads to 


(15) 


**r+i = r (-1)’+Y. J ) Ji±l_ ' 

1-0 V-f-r/t + r-f- 


/ 






" ° statisticians. 

The system (10) leads to the well known Thiele semmvariants 


/ f / f 


(16) 


k. = 2 (n irVKp- l)!(M,r’(u^)*» . .. 

irilirs! ... 7rrI(2I)^* ... (r|)'r 


of coefficients. It does not seem that this ffrt h i ^ 

f. stating this idea is to sa^^htrSifSe ^ recognized. 

Xr IS, except for the factor - (r - l)t the sum n7+h aeramvanant 

the Obtoed by th, 

<-0 ll ’ 


equal to zero. 
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It is of historical interest to note that MacMahon published his non-unitary 
function principle and the resulting set of seminvariants in 1884. Cayley [8] 
published an article in 1885 dealing with this same system Roberts’ Theorem 
having been known for some time (probably about 20 years), it seems probable 
that MacMahon and Cayley were aware of the Thiele seminvariants four to 
five years before Thiele’s definition [9] by an entirely different method. 


9. Linearly Independent Statistical Seminvariants. At the end of sectibn 5 
a method was indicated whereby a complete set of linearly independent semin¬ 
variants of a given weight r could be obtained. It has been noted previously 
that the one part symmetric function Sr or (r) leads to the Thiele seminvariant \r. 
As a further illustration consider the power product (22). From a table of 
symmetric functions we find that 

(22) = — - 4- 

4!ao 31aS 2!2!aS 

_ 2/^4 I 3aj\ 


and by the Roberts’ Theorem the statistical seminvariant 

ji(M4 ~ 4/iapl -f 3/18*) 


is obtained. In similar fashion a system of linearly independent seminvariants 
of weight ^ 8 have been computed and are given in Table I. For the sake of 
brevity they are expressed in terms of central moments Hence the degree, by 
which is meant the maximum degree in the n”s, is not apparent in the table. 
This definition of degree associates with the statistical seminvariant the degree 
(in the usual sense) of the corresponding homogeneous integral seminvariant 


10. Statistical Invariants. If the transformation 
(17) a: = { + mkri, y = mi\ 

is applied to the bmary form / and, if, in particular 



one system of invariants of / under this transformation is found to be 
(18) Dr = Ar/A\% r < n, 

where A, is defined in (6). By the Roberts Theorem we obtain the fact that 
the standard moment /ir//i^ ia an invariant of / under this transformation. 
Thus the standard moments, or standard seminvariants in general, have also 
an algebraic connection. The effect of the transformation (17) on the roots off 
is indicated by 

X — aiy = f -t" mkr] — J — wi(a,- — k)ri. 
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If m and h are defined as above, the result is the equivalent of measuring in 


standard units denoted by 


tt. - Ml 


The system (IS) is not a system of algebraic invariants, for algebraic invariants 
must be invariant under rotation, translation and change of scale, or stretching. 
The component parts of the above system are invariant only under the last two 

TABLE I 

lAnearly Independent Seminvariante of Weight g 8 



8 Ma *“ — 60^411 —* 70/i4* *|- 210/44pa* H“ 280/ii’'^2 •— 106^2^ 

6 fLi + Uitttii - 56miis 8 - 35/14* - + ItOsjVa + 630#»j* 

^ Ml ~ 4- 49yi^i — 3 6/11^ + 420 mi/i 2’ — 490^^2 — 630;ii* 

® ^ Ml - 28 )Ji,m2 - 66 m 8 s» + iQSw’ - 420MtM2* + 66 OS 1 V 2 + 030ai^ 

4 Si + liitiiii - 56^8m» + 35;24’‘ - 210/mM2» + liOiiM 

3 Mi - 7 siM 2 + 49 Mvti 35^,2 + 106/«4M2“ - 70/1,'m 

_ 2 -f 28»u^j — 66MSMi + 35>142 

types of transformation In statistics translation and change of scale ordinarily 
constitute the only desired transformations so that the standard seminvariants 

Pt Af Kf 

xF’ ' ”^'Sht well be called statistical invariants. 

11. Seminvariants and Invariants of Samples. Consideration of the defini- 
tion ot seminvanants and invariants shows that: 

from not because it is a function of deviations 

from the mean, but because ita function of the differences of the observations; 

the'stt^dnrTu y because it is a seminvariant divided by 

wreeillrilr^^^^ because it is a ratio o^ 

w semmvanants which are of the same order in powers of the observations. 
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These facts arc important from the statistics viewpoint because they show 
that seminvariants and invariants of samples are also seminvariants and invari¬ 
ants of the population from which the samples are drawn. 


II. ESTIMATES 

1. Power Product Seminvariants. The Roberts Theorem set up a duality 
relationship between seminvariants expressed in terms of coefficients and semin¬ 
variants in terms of power sums. It can be shown that corresponding to each 
pair thus determined there exists a third seminvariant expressed in terms of 
power products. This leads to what may be called a triple system of semin¬ 
variants, the interrelationships being most apparent when all three seminvariants 
are expressed in terms of the notation defined by (11). The seminvariant 

- — ^ becomes in this notation 

Oo ao ao 

(111) _ 3(11)(1) 

.^(3) n^^>n 

The corresponding power sum seminvariant is 

® _ 3(2)(1) , 2(1)° 

n n* ’ 

while the power product seminvariant just mentioned is 

(3) _ W 2(111) 
n 


The value of the power product notation lies in the fact that the numerical 
coefficients of the three seminvariants arc then identical, 'While this is not the 
case when monomial and elementary symmetric functions are used. 

Perhaps a few remarks are in order in regard to the proof of the relationship 
above expressed. The annihilator, corresponding to 0, for seminvariants in 
terms of roots is, see [2; 230-31], 


-D=t~. 

»»»1 doCi 


It is easy to see that 


r,[(pi^pP ■■■ P.'OI 1 V ( ^ 


*"1 ^*2 
' P2 


pv\p.-l, --p^.'), 


and also that, 

{pi'pP ■ ■ • pliyO) ^ (n - p + l)ipr ■ 

n(/>) nW 

Since 


pl-i 0 _ (pr^ 


p:-v) 


7,(p—1) 


jj r(P i)^Hp3)^° ••• (p.)'q ^ 1 ^ ■ • • ip<r-\p< -1) • ■ • (p.)", 

_ J 71^ 
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and 


{viY'iviy^ ■ • • (p.-0’''~‘(0) ^ n(viV'ipi)^‘ ■•• ^ 


n'’ 


nf 


n'‘~ 


it becomes evident that corresponding to any power sum sominvariant there 
exists a power product seminvariant with the same numerical coefficients. The 
converse is also true, 


2. Unbiased Estimates of Rational Integral Moment Functions. If t repre¬ 
sents a population parameter, and if t represents such a function of n observa¬ 
tions that the expected value of t is equal to t; then t is said to be an unbiased 
estimate of r. See Tschuprow [11; 74-76], Bertilsen [3; 144], and Fisher [12], 
Let {pipi ■■ ■ p.) denote a power product computed from a sample, the sample 
being from an infinite population. Then it is well known that 

n being the number of items in the sample. If be interpreted as "unbiased 
estimate of,” the above relation may also be written 

rmN n-ir > ' ' 1 _ (PiP» ”• P«) 

(19) E ■ • • Up,] - --, 

and it is seen at once that the power product senoinvariants defined in section 1, 
if computed from a sample of n observations, are the unbiased estimates of the 
corresponding power sum seminvariants of the infinite population from which 
the sample is drawn. 

This provides an algebraic interpretation as well as a different approach to a 
topic which has already aroused considerable interest among statisticians. In 
1927 Bertilsen [8; 144] gave the estimates of the first four Thiele seminvariants 
of the population in terms of Thiele seminvariants of the sample. In 1929 
R. A. Fisher [12] also obtained these results and gave in addition the estimates 
of the fifth' and sixth Thiele seminvariants. His results are in terras of sample 
moments. In 1937, P. S. Dwyer [13; 26] gave the estimates of the first five 
population central moments and indicated also means for obtaining the estimate 
of any rational integral isobaric moment function 
In the remainder of this chapter 

(1) Dwyer’s method will be extended and perhaps somewhat simplified, 

, (2) certain properties of this type of estimate will be pointed out, 

(3) estimates of all seminvariants of weight ^ 8 will be made available. 


3. Computation of Estimates. From the relationship (19) it is possible to 
write down immediately in a simple, although not immediately useful, form the 
estimate of any rational integral moment function. Thus the fourth Thiele 
seminvariant kt is given by 

X 4 = a4 “ 4>iaMi ~ -|- — Qn'i*! 
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so that the estimate of X4 is 

i = W _ ?(22) 12(211) _ 6(1111) 

* n 

Since power products are difficult to compute directly, it is necessary to 
express the estimates in terms of power sums. Dwyer [6; 30-33] gave a com¬ 
plete discussion of the problem of expanding power products in terms of power 
sums and also gave tables of power products in terms of power sums for 
weights ^ 6. By use of (12) it is also possible to use tables giving monomial 
symmetric functions in terms of power sums. One table by J. R Roe [14; 
plate 18] includes all cases of weight ^ 10. 

By use of such a table we find 

(31) = -(4) -f (3)(1). 

(22) = -(4) + (2) (2), 

(211) = 2(4) - 2(3)(1) - (2)(2) -p (2)(l)^ 

(1111) = -6(4) -I- 8(3)(1) -h 3(2)(2) - 6(2)(1)^ + (1)*. 

If these results are substituted in Li above and like terms are collected, it is 
found that 

= n\n + 1)(4) - 4n(n -f 1)(3)(1) - 3n(n - 1)(2)“ -f 12n(2)(l)“ - 6(1)‘, 
a result which agrees with that given by R. A. Fisher [12] 


4. The Dwyer Double Expansion Theorem. The Dwyer double expansion 
theorem, [6; 34] and [11; 37-39], states that if any isobaric sum of power products 
of weight r indicated by 


( 20 ) 


rl 


wTiCffi* ' • ■ it') 


(eiir ••• (gil)''7ri|... w,I 
be expanded in terms of power sums in a form indicated by 


( 21 ) 


rl 


^ (pi!)'‘ ■ • ■ (p.D'Vi! ... w.I 


then the coefficient Or of the power sum (r) is given by 

(p-l)lrl 


( 22 ) 


Or « 2(-l)'’~‘ 


(pil)'‘---(p.ir'irx!...w.l 

and that the coefficient Ori"T„ of (ri)(rj) • • ■ (r^) is 
(23) Ofl-'T,, = CtriOr, Or„. 




The barred product indicates a symbolic multiplication by suffixing of sub¬ 
scripts which is exemplified by 

0afl» — (6« — 36 ji -|- 26111) (hj — 611) = btt ~ hni — 36 *ji Sbjm — 26 uiii “ a«*. 
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The applicfttioii of this theoreni to the present problem climiimtes the use of 
tables and permits the independent computation of the coefficient of any particu¬ 
lar products of power sums in the expansion in terms of power sums of any given ' 
estimate. The illustration given by Dwyer [13 ^ 39j 40] exemplifies both of 
these points very well. 

6 . Estimates of all Seminvariants of Weight g 8 . If the estimates of any 
complete system of seminvariants and all products of tlniso seminvariants up 
to and including weight r are knovm, then the estimates of all seminvariants 
of weight g r are obtainable as a linear combination of these known estimates. 
For example, suppose that we know the estimates of nil Thiele seminvariants 
of weight g 5 and wish to find the estimate of / 15 . Since /is = kj + IOX 3 X 2 , 

= Ms = + 10 ^^[XsXi] = is + 10ia» . 

In table II are given the estimates of all Thiele seminvariants and all products 
of Thiele seminvariants of weight g 8 . From this table the expressions for Lt 
and L 32 are obtained and, by taking the combination indicated above, it is 
seen tfiat 

n^Ms = (n* - 5n' -f- 10n“)(6) - 6 (n’ - 6 n* + 10?i)(4)(l) 

- 10(n’ - n)(3)(2) + 10(n' - in + 8)(3)(1)* 

-t- 30(n - 2)(2)*(1) - 10n(2)(l)® -f 4(1)‘, 

a result which checks with that given by Dwyer [13; 27]. In similar fashion 
the estimate of any other seminvarianlj of weight g 8 can be obtained by use 
of table II. 

6 . Computation Checks. There are a number of checks which can be applied 
to the entries in table II. These may be of interest simply as properties of the 
estimates, and they may be of use in correcting errors which may possibly have 
crept into the tables. 

When any power product of more than one part is expanded into power 
sums, the sum of the numerical coefficients of the expansion is zero To prove 
tihis we need only to consider a set of observations of which one observation is 
unity and the rest are all zero. Then any power product of two or more parts 
is necessarily zero and all power sums are equal to unity. Hence the initial 
statement of the paragraph follows immediately. 

From this fact it is apparent that the sum of the coefficients of Lf is ~ , and 

n 

the sum of the coefficients of Lrirj.-.r, is zero. Thus for Lt we have 

n + n* - 4(n' + n) - 3(n“ - n) -|- 12w - 6 1 

r(i; ^-- -, and for Ln the sum of the 

n 


coefficients is 




(S + «8 - s“)8- 
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9(n^ -14«>+95?i» - 322?i+420) I -3(ii‘ - lOn* + 104n* - 305n + -3(3n’ - 33n* + 128?i - 168) n* ~ 18n’ + 125n* - 384n + 441 
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A condition satisfied by the coefficients of any serainvariant is that their sum 
is equal to zero (See section 2). This provides another cheek on the entries of 
table II, although the seminvariant must be written in homogeneous form 
before the check is applied. Thus we may write 


+ 1) ^ + 1) 


(3)(1) 


(2f 


— 3(n — 1) ^ + I2n 


(2)(X)' 


w 


n* 


Gn 






and the sum of coefficients b 


(n + 1) — 4(n d-1) — 3(n — 1) + 12n — 0n = 0. 

Several checks arise from the fact (see section 6) that every seminvariant 
must be annihilated by the operator 


(24) 

<-x aSi 

Another check results from the discussion of the next section and is so apparent 
as to need no comment. 

All the checks mentioned in this section are applicable to the estimate of any 
seminvariant. 


7. Estimates as Sums of Simple SeminvariantB. A seminvariant suoh as 
in which the coefficients of the m^'s are functions of n will be called a composite 
seminvariant, while a seminvariant in which the coefficients of the m'’8 are 
purely numerical will be called simple. The fact that is to be established in 
this section is that every composite seminvariant is the sum of simple semin- 
variants. As an illustration consider L<. It is apparent that 



I4 


n 

nW 


I4 + 



where k and /c* are seminvariants of the sample corresponding to X« and Kt . 
Both li and ki are simple seminvariants. 

That a composite seminvariant may always be expressed as a sum of simple 
seminvariants can be demonstrated by considering the effect of fi!', (24), on a 
composite seminvariant. The* coefficients are polynomials in n and are un¬ 
affected by the operator. The expression resulting from application of the 
operator can vanish only if the coefficient of vanishes for every r*. Thus a 
composite seminvariant which has r different powers of n appearing in its ooeffi* 
cients is expressible as the sum of r simple seminvariants, which are not neoes- 
saifiy distinct. Table III exhibits the estimates of Thiele seminvariants of 
weight ^ 6 as puma of simple seminvariants. 

Since the factors, appearing'in front of each of the simple seminvariants in 
e expression resulting from breaking down a oomposite seminvariant, are of 
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successively lower order with respect to n; it is possible to obtain approxima¬ 
tions of various orders to the value of an estimate by using the appropriate 
portion of the expression given in the table. 

8. The Estimates of the k’s. The seminvariant Kr possesses an interesting 
property which will be called invariance under estimate. By this is meant that 
the estimate of Kr is kr multiplied by a suitable factor. In particular, k 2 = and 
Ks = Ma and it is well known that 


E-%2] = 


n' 


(2) ^2, 




n 


n 


( 3 ) 


rrii 


so that the Kr certainly possesses the property for r = 2 and 3. It can be shown, 
however, that 


(25) 

From (15) 

so that 




(2) 


K^r^l = 


K2r 


1^/2A , , 


i <-l 


n 

By the Binet-Waring identities [15; 6-7] 

(26) (a-h) = (a)(6) - (o -f 6) 

and this holds for power products regardless of the values of a and b. Hence 
(2r) , 1 f2r\ (i)(2r - i) - (2r) 


K2r = ^ + 

n 


(2^ 

n 


Since 


S(-i)'C0'»=^+s 


(i)(2r - i) 


n 


( 2 ) 


the coefficient of ^—- above is-^ and it follows immediately that 

n n — 1 

IT _ 1 V/'_iN< (»)(2r' — ^ _ 'n XT 

\i) 

I 

This proves the first half of (25) and the second half can be proved in similar 
fashion, although with considerably more difficulty. 
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9. Other Simple Seminvariants which are Invariant under Estimate. It has 

been previously remarked (Chapter I, section 2) that the k system of semin¬ 
variants are the seminvariants of minimum degree, those of even weight being of 
second degree and those of odd weight being of third degree. The kj/s are the 
only seminvariants of degree 2, but for odd weights greater than 7, there exist 
more than one seminvanant of degree 3 It is not difficult to show that these 
additional minimum degree seminvariants are also invariant under estimate. 
The type of proof used could have been applied equally well to obtain the results 
of the preceding section and indicates that the property of invariance under 
estimate which is possessed by the k’s is a direct result of their minimum degree 
property. 

Consider the estimate in power product form of any seminvariant of degree 3 
and odd weight Power products of 1,2 and 3 parts wUl appear, By the Binet- 
Warmg identities each three part power product (abc) yields a third degree power 
sum product (a)(6)(c) plus other products of lower degree. Since (a)(6)(c) 
comes only from (abc) its coefficient must be identical with that of (abc) and will 
therefore be a constant divided by The coefficient of each second degree 
product of power sums will be a sum of terms, the first of which comes from the 
corresponding two part power product with a coefficient identical with that of the 
power ^oduct, and the others come from the three part power products. Then 
the coefficient of a second degree product of power sums must be of the form 

^ , C 2 + Ca + • • • + c« _ cin + c't 
n.<’> «(» ' 

Similarly the coefficient of the first degree power sum term will be of the form 

din^ -h d2« -|- dt 


Since the estimate of a seminvariant is a seminvariant, it foUows that d, a 0. 
This is true because the coefficient of n,ust be the coefficient of ^ 

immedia^ly^P688iblI^to^b™aTtL'"cfLTO°-^ contrary be assumed it is 

semlnvariants^he first beffig of «e^vanant into two simple 

SowJ thaTan^eSiv jianul Sefs''” a^'eliste! 

estimate, is also apparent that the 

10. Composite Seminvariants which a™ Tniran'.« 4 . j . 

each weight 7-^4 there an u under Estimate. For 

under estimate. For weights 4 and which is invariant 

weights 4 and 5 this seminvariant is easUy obtained by use 
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of Table III Thus for weight 4, form the seminvariant . From the 

table we find that 


^^Xi + C22X2] — ^ li + — hi + C22 ll — C22 


n«) (n - 2)(w 

= (?4 + C22ll) + — ■n.®C22)fc4. 

Tt'’ 


hi 


,(0 


If C 22 = the seminvariant is invariant under estimate. 

IS 


(27) 


■ 


This seminvariant 


In similar fashion we find for weight 5 

(28) ^6 = Xs + -n. XsX2. 


For weights > 5 considerably more difficulty is encountered For weight 6 , 
for example, we consider the seminvariant 

Xa + C42X4X2 + CaaXs + C222X2. 

By use of table III we obtain 

E [Xe + C42X1X2 4 * CssXa + 0212X2] = (h + Calilt + Cssls + 0222® +’ 


where # is a sura of .other seminvariants with coefficients which are functions of 
n and cn , cu , C 222 . Now there are only four linearly independent seminvanants 
of weight 6 and it is necessary that one of these involve the term ( 1 )V^°- By an 
argument analogous to that of the previous section this term cannot appear in 
4 and therefore 4 is expressible in terms of three or fewer seminvariants Ac¬ 
tually three are necessary,and equating the coefficients of these to zero the values 
of C 42 , C 83 and cjji are uniquely determined. The result is somewhat lengthy 
and scarcely of sufficient interest to record here, 

The same sort of procedure can be used for determimng seminvariants of 
higher order which are invariant under estimate, but the labor of computation 
becomes very great. 

It is possible to obtain moment functions which are invariant under estimate 
by means of a set of equations given by Dwyer [13; 38-39]. These equations 
connect the coefficients of a general isbbaric moment function and the coefficients 
of the expected value of that function, In his notation if, for example, 

fi = 04 ( 4 ) -(- 4qji(3)(l) 4 - 3022 ( 2 )* 4" 60211 (2) (1)* 4" tiiui(l)^, 

then 

E[fi] = biW/ii 4" 46Mn**VaMi 4" 3b22W**Vi* 4" 4" , 
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vi'here\n; 


(29) 


04 + 4fl31 + 3(122 + 60B11 + OlUl = ^4 I 

031 + 3a2u + Oim = ^si > 

022 + 20211 -j- OllU = ()22 1 

0211 + Oiiii = bin ) 

Oim = bun. 


E 


The problem at hand demands that 

■ (4) , . 2 (3)(1) , , 2 (2)^ , (2)(1)* , ^ 4 „ (1)^ 

-Ml — + 4n 031 ^—r- + 3n 022 —^ + oo oiu —j-r « Ouu —, 

n vr n w n* J 

= X[n04P4 + + 371***022^2^ + 671***0211^2^1* + 77*^*01111 Pl*] 


so that the equations (29) become 

Ti^Oiiii = X7l***0iiii , 

7i*0aii = X77***(a2ii + Oini), 

71*022 = X 7 l***(a 22 + 20211 + Oun), 

71*031 = X7i***(a3i + 30211 + Onii), 

' 7104 = X7i(04 + 4 o3i d" 3 o 22 + 00211 d" Onii)) 

and from these equations Oi, 031 , 022 , Om can be found in terms of Onn . Ob¬ 
viously there is only one solutioti if none of the o’s are zero. In general, for any 
weight r, a similar system of equations can be found and they determine the 
coefficients of a moment function of weight r which is invariant under estimate. 
It appears that this moment function is always a scminvariant although no 
proof of the fact has been found. The moment functions of weight 4, 5 and 6 
obtained by this method are identical with , ^3 and ^3 defined above. 


Conclusion. The results of this paper include: 

1 . A demonstration of the fact that the theory of statistical seminvariants is 
identical with the theory of algebraic 'seminvariants. 

2. The introduction of new statistical seminvariants. 

3. Simplification of the computation of estimates, 

4. Proof that the estimate of any serainvariant is also a seminvariant. 

5. Proof of the existence of a trio of seminvariants with the same numerical 
coefficients 

6 . A discussion of seminvariants which are invariant under estimate. 

Many thanks are due Professor P, S. Dwyer for his able guidance in the 
preparation of this paper and to Professors C. G. Craig and J. A. Nyswander for 
helpful comments. 



eBMINVAMANTS AND THEIR ESTIMATES 


67 


.REFERENCES 

[1] L. E, Dickson, Algebraic Imarianla (1914). 

[2] E B. Elliott, An Introduction to the Algebra of Quantics (1895). Referenoo is to the 

second edition (1913). 

[3] J. Hammond, "On the Solution of the Differential Equation of Sources,” Amer. Jour. 

of Math , Vol. 6 (1882), pp 218-227. 

[4] P, A. MacMahon, "SeminvariantB and Symmetric Functions,” Amer Jour, of Math., 

Vol. 6 (1884), pp. 131-163. 

[5] Buhnsidb and Panton, Theory of Equations (1881). Reference is to the 1904 edition, 

Vol. II 

[6] P. S, Dwybr, "Combined Expansions of Products of Symmetric Power Sums and 

Sums of Symmetric Power Products with Applications to Sampling.” Part I. 
Annals of Math. Stat,, Vol. IX, 1, (1938), pp 1-47 Part II, Vol. IX, 2, (1938), 
pp. 97-132. 

[7] T. N. Thiele, Theory of Observations (1903). 

[8] N. P. BERTiLaEN, "On the Compatibility of Frequency Constanta and on Presumptive 

Laws of Errors.” Skandmavisk Aktuanetidshift, Vol. 10 (1927), pp 129-156 

[9] A. Cayley, "A Memoir on Seminvanants ” Amer, Jour, of Math. Vol. 7 (1885), 

pp. 1-26. 

[10] T. N Thiele, Almindelig lagttagekeslaere (1889). 

[ 11 ] A. A. Tschtipkow, Orundbegriffe und Grundprobleme der Korrelationsiheorie ( 1926 ). 

[12] R. A. Fisher, "Moments and Product Moments of Sampling Distributions.” Proc. 

London Math. Soc , Vol 2 (30), (1929), pp 199-238. 

[13] P, S. Dwyer, ' 'Moments of Any Rational Integral Isobaric Sample Moment Function,” 

Annals of Math Slat., Vol. 8 (1937), pp. 21-05 

[14] J. R, Rob, "Interfunctional Expressibility Tables of Symmetric Functions " Dis¬ 

tributed by Syracuse University (1931). 

[15] FaA db Bruno, Theorie des Formes Binaires (1876). 

University of Michigan, 

Ann Arbor, Michigan. 



THE ERRORS INVOLVED Hi EVALUATING CORRELATION 
DETERMINANTS 

By Paul G, Hobl 


1 Introduction. Many statistical problems require for their solution the 
evaluation of correlation determinants. The method usually employed for such 
evaluation is that of Ohio/ in -which the order of the determinant is reduced by 
successive operations with selected pivotal elements. The repeated multiplica¬ 
tions and subtractions involved in the method necessitate rounding off the 
elements in the successively reduced determinants. The calculated value of the 
original determinant is therefore in error; and so the question naturally arises 
as to the magnitude of this error. 

Previous attempts to answer this question seem to be satisfied -with finding 
an upper bound for the magnitude of the difference between the value of the 
original determinant and its value after its elements have been rounded off. 
Moreover, this bound is expressed in terms of the errors in the elements and the 
minors of the original determinant, whose values are assumed to be known ^ 
exactly from calculation. However, several reductions are often needed before 
the value of the determinant can be obtained; and furthermore the minors are 
subject to the same type of errors as the determinant itself. The problem, 
therefore, is to find an upper bound for the'magnitude of the difference between 
the final calculated value of the determinant and the determinant itself which 
involves only calculated quantities. 

This paper treats the problem from two different points of view. In the first 
part an upper bound is obtained for the magnitude of the error. In the second 
part the first order error terms are given more detailed consideration, with the 
result that an upper probability bound is obtained for the error. 


2; Absolute Bounds. Consider the correlation determinant A ~ \ r,i j. To 
evaluate A by the method of Ohio, it is convenient to select diagonal elements 
as pivots It will be assumed without loss of generality that the upper left 
diagonal element is always chosen as the pivotal element in each reduction. 
After ^each reduction, elements are rounded off to a fixed decimal accuracy. 
Let o„ represent the element i,j after the k-th reduction, kJ/ the difference 
between the rounded value of element ot, and aj; itself.' After h reductions, we 
arrive at the determinant 


= 


L « 

<l*+u+l T 


lb 


+ * 


k 

fin 


‘ See for example, Whittaker and Uobinson Cakului of Oisemliont, p. 71 . 
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By treating F* as a function of the x*, it may be expanded by Taylor’s formula 
as follows: 


(1) = A* + t _ 4Aj, + ^ ZS 




fcfi 


+ 


where A’' is the value of F* for all zero, AJ, is the cofactor of a'l, in A”, etc. 

For a determinant of order n, the value of the determinant obtained after a 
single reduction is the value of the original determinant multiplied by the 
n — 2 power of the pivotal element used. Applying this to F*', it follows that 

A" = 

AJ, = 

"■»3P<Z •^A3 •**???) 


etc., where the exponents of Hk are ordffiary exponents rather than notation. 
Substituting in (1), 


F” = nr^'^F^- 


+ ^ nr'^ 

fc+-l 21 


w-i 


In order to express F* in terms of the original determinant, this expansion 
will be condensed by means of the following operational notation. 

(2) F*’ = (1 + D + D' + ... + Z)’’-'‘)7/r*’"'F'‘-\ 

where D‘ operates on by reducing the exponent of by i units, 

by summing from Ic + 1 to n the product of i terms in 2 :*’ with the corresponding 
cofactors of F’‘~^, and dividing the result by factorial i. Using this as a recursion 
formula. 


F* = (1+ Z> + ... + + • • • + • • • 

(1 4 - • + D”~^)HrV. 

However, 

I an + 2:n 


F® = 


Ortn + 


= A, 


since we assume that X{j = 0 for our original determinant. Consequently, 
F* = (1 + ... + (!+•••+ ... 

( 1 + ■. • + 


(3) 


Since D' operates on F*“* in (2) to extract the proper cofactor of i less rows than 
in F*~^, which in turn reduces the exponent of all factors Hjt-i m the expansion 
of F*~^ by i units, D' reduces the exponent of all H‘s following it in the expansion 
of F* in (3) by i units. 
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Following these rules of operation, and expanding so as to collect terms of the 
same degree in the x’s, we may write 

F’‘ = m a;,,) + . 


( 4 ) 


H*"* ’ • • • Hr * (terras in xt,Xpf^ + •. ■ 


Letting H = • Hi and C = H* • El' , we may write 

I = F’’ C\ = C 4 (terms in a:,-,) + 4; (terms in ar./Kp,) + ■. • 


and hence 

(5) J = ^ - A = ^ (terms in x./) + i (terms in si;®,,) + ■. • . 

Now J is the difference between th% calculated value of A, using Ohio’s reduc¬ 
tion method and rounding off after each reduction, and the true value of A. 
We are interested in finding an upper bound for the magnitude of J. To ac¬ 
complish this we shall first overestimate the number of terms in the various 
sums of (6), then find an upper bound for the magnitude of the terms in these 
sums, and finally combine the two results. 

In counting terms by means of (3), we may ignore the H's since they merely 
serve as coefficients of the x’s. Therefore consider the nature of the terms in 


, (1 -f .. -I- D’‘~*)(l + ■ ■ + ... (1 -f- ■ • • + H"">. 

f n n 

Now (1 -f ■ • + D')^ contains the sums 52 ^ 5222 etc.; 

. n^»+l 21 n~v+l 

hence it contams s“ terms in x ,^, ^ terms in Xi,Xp,, etc. Each of these 

2 


is not greater than ,sCs, etc ; consequently, the number of terms of each type 
is not greater than the coefficient of the corresponding power of D in the expan¬ 
sion of (!' -b H)**. Therefore, 


(6) (1 -b D)‘’-*>’(1 -b • • • (1 -b = (1 + H)”, 

where m = {n — kf ■ + {n — 1)“, contains at least as many terms of each 

type as are found in the expansion of F*. This gives us the desired overestimate 
of the number of terms in the various sums of (6). 

In finding upper bounds for the magnitudes of terms, it is to be noted that (4) 
is written with all common factors extracted from each set of terms of the same 
degree in the re’s. In the parenthesis containing terms consisting of the product 
of r xs, the first sum will have unity for its coefficient while the last sum will have 
Eifltt-i ■ ■ • Hj as coefficient, with all sums between having os coefficients prod¬ 
ucts of H s with exponents < r. Hence an upper bound for all coefficients in 
this parenthesis may be written as S', where H is the magnitude of the product 
of those H s whose magnitude is greater than unity, but unity if none exceeds 
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unity. Now_terins in a:., are multiplied by A,,, those in jp.jXpj by , etc.; 
therefore let A,,, A,,;,,, etc., be the absolute values of the largest in magnitude 
of such cofactors. With this notation for upper bounds for magnitudes of 
terms, and (6) giving an upper bound for the number of terms, we may write an 
upper bound for the magnitude of J as follows; 

(7) I 1 ^ (I *) + (I*)'+ • • • • 


where « > ] a: ] is the maximum error of rounding. This result is valid for any 
determinant with real elements. All quantities on the right are available from 
calculations except the A; consequently this upper bound will be useful only if 
satisfactory bounds exist for the minors of the determinant. It can be shown 
that (7) holds for any minor of A, say A„„ , if the A have uv added as subscripts; 
and therefore it may be applied to the question of the accuracy of least square 
solutions. 

For the correlation determinant A it can be shown that the magnitude of a 
minor of order n — k is bounded by A;!/2** for k even and for k odd. 


Setting a — and substituting these bounds in (7), 


1 J 1 < am + ^ + a* mCt ^ • 




( 8 ) 


2 2 3 S 

. , a m , am 

< am "T —— -f- - r, 

2 2(1 — am) 


for am < 1. Since am is obtainable from the calculations for A, this is the 
desired upper bound for the error in question. 


3. Probability Bounds. In order to find probability bounds for this error, 
it will be necessary to expand the H’a since they involve the variables x. Con¬ 
sider Hk = alh^ -f . Since came from repeated reductions of A, it is 
expressible in terms of the a;'s and the minors of A To obtain this expansion of 
Ek consider 


(?* = 


a** + iswb 


Using the same methods as for F*, this may be written os 

G‘ = B‘+ i: x':;-B’ii -h i SE + • • •, 

k—t+i fc-j+i 
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where B‘ is the value of G‘ for all x'‘ ‘ zero, etc., and where B’ = 

B’,, = HkZlGit\ etc. Substituting, 

G’ = + mz] E mzi EE + • • • 

Using operational notation here also, this may be written as 

(?' = (1 + i; + ^1= + .. - + E')Hip\G‘^\ 

where the E’s operate the same as the 2>’s, except that suras are taken from 
k — s + 1 to k rather than from n — s + 1 to ». Treating this as a recursion 
formula, 


= (?' = (1 + E)HLi{l +E + E")HU 
However, 


(?* = 


On + Xn 


dll 

• 


. 

flu + Xhk 


dkh 


(1 + ■ ■ + E’‘~^)Hr'‘G\ 


= A*. 


+ Afc, 


Consequently, 

(9) H* = (1 + B)hU( 1 + E + E^)Hl.i ■ • • (1 + 

Since the E's operate on the following H’s to reduce their exponents, the number 
of terms of various t 3 ^es, that is, of various degrees in the a:’s, will not be de¬ 
creased if the order of H’b is disregarded and their exponents held fixed. There¬ 
fore consider 

(10) H'k= {1 + E)il + E + E^) ... (I + E’‘~^)AtHLi ■ • • 

as an ordihary recursion formula in the H’a for overestimating the number of 
terms of various t 5 rpes. If (10) is substituted for successive H’s within itself 
in a systematic manner until no H’s remain, it will be found that 

(11) (l+E) {1 + . . + E’‘-^)A, 

lil+E) + E'‘~^)Ai,f .. [(1 + E)A,f-*[A,f~\ 

To merely count terms it is permissible to combine like terms to give 

HU (1 + ^ 

= (1+ J5)'‘-\l +E + eY~' ... (1 -f ... + E^YK, 

the product of the A’s. Since the E's operate like the D's, the same 
wg^ents as tho^ used to arrive at (6) may be used to replace (1+ ^ + . 

of f ermfi; t' 1°' Overestimating the number of terms. Hence, the number 

f terms of vanous types m is not greater than those in 

(1 + ^)'‘''(1 - t - £)“’•»*-' ... (1 - I - B )»-»' “"(1 = (1 - 1 - B)Y 
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where ly*, = 2*’ ® + 2^ 2* * • + (fc ~ 2)* • 2° + (fc — 1)®*. Therefore the 

number of terms of various types in. is not greater than in 

^^ 2 ) (1 -|- ^ 

It is easily shown that t can be condensed into the form 

(13) t = [2'-'(n -k)-l] + - k)-l] + ■.. +(k - l)“[2‘’(n ^ 

From (3) it is evident that the number of terms of various types in F* will not 
be greater than those in the expansion of F‘ when the exponents of the H’s 
are held fixed But from (6) we have an upper bound for the number of terms 
arising from the D’a, and from (12) those arising from the H’a ; hence the number 
, of terms in question wiU certainly be bounded by those in 

(14) (1 + D)”-"' = (1 + DY- 

Now consider the magnitude of terms. The terms arising from the operation 
of D’a contain minors of A as factors, while those arising from the operation 
of E’a contain minora of A., where i ranges from 1 to k. Let aJ,' , etc., denote 
an upper bound for the magnitudes of all such minors of the same number of 
subscripts. It is easily shown that A' with 2r subscripts is not less than the 
magnitude of the product of several minors whose subscripts total 2r in number 
The terms of various types also contain as factors products of the constant 
terms in the H’a. The constant term in Hk, which will be denoted by h*, 
can be obtained from (11) by operating with all ones since it will’be unaffected 
by' disregarding the order of operation. Hence, 

hk = A*A*_jAi_j • • • Aj aJ 

Since the A, are principal minors of a positive definite determinant with no 
element greater than unity, h* has unity for an upper bound. Thus, an upper 
bound for the magnitude of any term in the product of i x’a will be t’ times A' 
with 2i subscripts. 

With upper bounds now available for the number of terms and the magni¬ 
tudes of terms, we are in a position to consider the complete expansion of I m 
which the coefficients of the x's will be constants rather than H’a. Evidently 
the terms in Xij will come from the terms in Xi, of (4) with the H’a replaced by 
the constant terms in their expansions. If Z denotes these terms, then 




(I-*-! 


(15) 




+ • • • -Y hk 


• ■ • ha 53 a:l;A(/^. 


Now consider an upper bound for 17 — Z |. Since I — Z involves only terms 
in the product of two or more z'a, we need consider an upper bound for such 
terms only. From the results of the, two preceding paragraphs, we obtain 

I J - Z 1 < -b + • • • . 



64 


FAUt G. HOEL 


But from the paragraph containmg (8), bounds are available for the A'; hence 
\I - Z \ < + ... 


2 i as 

<i± A- * ^ 
- 2 


= $, 


for e/i < 1. Since Z is of order 6, $ will ordinarily bo small compared with Z; 
therefore consider the nature of the distribution of Z. 

If we write Z = aiXi + ■ • + apX^, then, since the a;’s arc independently 
distributed with rectangular distributions, it is easily shown that /ij = 

g 2] a,, as = 0, «4 = 3 - I 2] B the a, are approximately equal 

in magnitude, then is approximately equal to 3 — 1/p. But from (16) 
P > K™ ~ + ■ ■ + ~ l)^ which IS sufficiently large for determinants 

employing Ohio’s method to justify the assumption that Z is approximately 
normally distributed. Setting L = • • • hi~^, 




l(n - + ... + (n - ly - ^((n - + ... + (71 _ i) 2 j] 


< y [(ra - A)' + ... + (w - 1)' - I (2n - fc - 1) j = 

Hence, the probability is >.95 that \ Z\ < 2'^'. Since |/ - Z| < $, the 
probability is >.95 that | J | < 2^' and therefore the probability is >.96 
that 


(16) |Jl<fL±i, 

c 

This inequality will usually give a smaller bound for | J | than (8). How¬ 
ever, when A is small the H’s may be small, with the result that C will be small 
and (16) may not give a satisfactory bound for | J |, In such cases the bound 
given by (8) may not prove satisfactory either. 


4 Example. Consider a correlation determinant of order 7 in which the 
elemmts ^e accurate to 4 decimal places. If Ohio’s reduction method is 
applied until a 2 rowed determinant is obtained, then n = 7,k = 5 « = ,00005 
w - 90, n = 176, = 00005-v/i.60/3, and we obtain from (8) that 


\J\< 



•0045 + f -) .00001 -f 



.00000006 
1 - .0045 ffjH 
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where E/B is obtained from calculations involved in evaluating the deter¬ 
minant, From (16) we obtain that the probability is >.95 that 


\J\ < 


.0008 
C ' 


The relative advantage of the second inequality over the first dcpend.s on the 
size of the pivotal clcmonts, as does the usefulness of either inequality. 

University of California at Los Angeles 



THE CUMULATIVE NUMBERS AND THEIR POLYNOMIALS 
By P. S. Dwyee 

In a recent paper [1] the author has shown how the moments of a distribution 
can be obtained' from the last entries of cumulative columns with the use of 
multiplication by certain numbers. These numbers may be called "cumulative 
numbers,” It is the aim of this paper to show how these numbers can be 
obtained from the expansion of x‘ m terms of factorials of the S'th order and to 
demonstrate properties of the polynomials of which these numbers are the co- 
eflBcients. 


TABLE 1 


Successive Frequency Cumulations 


( 1 ) 

(2) 

(3) 

(4) 

(6) 

(6) 

(7) 

(8) 

X 

X 

A 

C>- 

(7» 

C> 

G* 


ffl -[■ 6 

6 

64 

64 

64 

64 

64 

64 

a + 5 

5 

192 

256 

320 

384 

448 

512 

a + 4 

4 

240 

496 

816 

1200 

1648 

2160 

a + 3 

3 

160 

656 

1472 

2672 

4320 

6480 

a "h 2 

2 

60 

716 

2188 

4860 

9180 

16660 


1 

12 

728 

2916 

7776 

16956 

32616 

a 

0 

1 

729 

3645 

11421 

28377 

60993 


1, The values C\{ux). We use the notation C5 (m*) of the previous paper 
[1,289] to exprc.s.s the columnar chmulated entries The j indicates the order 
of the cumulation while the i indicates the number of the term, counting from 
the bottom of the column. Thus in Table I, which presents the cumulations 
of a frequency distribution used in the previous paper [1,289], C[ = 729; Cl = 
3645; C 2 = 2916; ■ ■ , CS = 6480, etc. Now if A -j- 1 values of x are spaced at 
' unit distances and if'tbe smallest value of x is 0, it can be shown that 



c! = r(*+,iK;ci = 2i„.; 

0 0 0 21 


n3 V (® 1)® 


'* I 


rt? _ V ” 1) 

0.3 ^ nf 

0 21 


and, in general, j > 0 and ; d-1 > i, 





= 2 + J + 1 

*n*0 jl 
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Similarly if h values of x are spaced at unit distances and if the smallest value 
of X is 1, it can be shown that 

C? = Z xw.; = Z (x - IK; C? = Z u.; 


K 


1 1 
_ ■A (x — l)(x — 2) 


21 


/is _ 

2, w*. Oa ^ 21 

and, in general, j > 0 and j + 1 > i, 

(2) = Z u,. 

*—1 J 1 

It is to be noted that the coefficients of tt* in (2) could be obtained from the 
coefficients of Ux in (1) by the substitution x + 1 = x'. 


2. The powers in terms of factorials of the s-th order. If the s-th powers can 
be expressed in terms of factorials of the s-th order (factorials having s factors) 
then the moments can be expressed in terms of the cumulations. For example 


2 (x -b l)x -b x(x — 1) , 

X = - 55 —so, from (1) 


* JL J. 1^W * 

ZxV. = Z - 4- 1 + E^f. = cl + cl 

0 0 21 D 2! 


And since 


^3 _ (x + 2)<’> -b 4(x -b !)"> + x”> _ 

X = -:r;-, we have 


31 


* * r... Jl J. 

2 «7. - 2 + 4 2 + 2 V-- c! ++ °!' 

0 0 o! 0 o! 0 o! 


In general if 

Aii(x "b ,s ~ 1)^** ~b A.Hi{x -b s — 2)**^ 

__ -b • • • -b A„{x -b s — -b • • ■ -b A,.x^*^ 

s! ’ 

then 

(4) Z x‘Sx = + a,2c;+' -b • • • + a.,c;S + . •. -f A,.c:Vi , 

0 

while if the smallest value of ai is 1, we have 

(5) Z x‘fx = A.iCI+‘ -b A.,Ci+^ -b + A./Cr + ■ • ■ + A..C‘^\ 

1 

These quantities, A,j , in (4) and (5) are simply the coefficients of certain fac¬ 
torials of the s-th order in the expansion of x'sl. 
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These numbers, for small values of s; are easily obtained. It is possible to 
use the table and a recursion formula of a previous paper [ 1 , 294 - 295 ] for larger 
values of s It is also possible to obtain these values, without involving cumula¬ 
tive theory, from ( 3 ) above. 

While doing this we make a more general approach by expanding (a -j- x)' 
in terms of these same factorials with the coefficients now functions of a. This 
is possible if we add an additionfi,! term, A,tt{x -f- s)'*’, to the numerator of the 
right hand side of ( 3 ). We have then 


(6) 


(o -f- x)' = 


A^(x -|- -f- Aaiix -h s — 1)^*' 

_ -h • • • -|- Aa,{x S — j)^'^ AssX^’" ^ 

s\ 


The determination of the values A„ can be accomplished by purely algebraic 
means by successive substitution of x = 0, 1, 2, . s In this way wc obtain 
s -|- 1 equations in s + 1 unknowns For example when s = 2 

^ _ Aioix -b 2)® -b Aiiix -b 1)*^' + A22X^^^ 

{a X) -;r-j- 


so that when x = 0,1, 2, we have 

a = Aia ; (a "b 1) = 3^20 "b An ; (a -b 2)^ = 6/I20 "b 3.421 -b -4 22. 

The solution is lio = .a? , An = 2 oh -b 1 , ^22 = where b = 1 - a. It 
follows that 

(a + xY = + ( 2 ab + 1) and hence that 

k 

E (d + = a'C? -b (2ab -b 1)^2 + b^Cl, 

as indicated in the previous paper [ 1 , 293 ]. 

When a = 0, then 6 = 1 and we have 


2x/j, — C2 -b cl while when a = 1, 6 = 0 and the right 
hand side becomes Ci + C2 

It follows that the general cumulative numbers might also be defined as the 
solutions of the s -b 1 equations in the s -b 1 unknowns obtained by placing 
a: = 0, 1, 2, .. ,sin (6), 


3. The evaluation of the cumulative numbers. Formal algebraic methods of 
evaluating equations (6) are somewhat tedious so wc use finite difference theory 
to aid in finding the solunoii. As in the previous paper [1] we use the notation 


V r* = Di — Vx~i and = 


fwi when a < x < o -b fc' 
\0 otherwise -• 


We then write, from ( 6 ) 
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+ xY — + s)*'* + -f- s — 1)^“^ 

(7) - 

+ + xi„(x + S — + .. + 

We note fui'llier that V ^*( :t + r )‘** = ^ ^ q|' Wo have then 

( 8 ) + j Y = A„ . 

It lias been sliown in the previous paper [ 1 , 292 ] that 

( 9 ) V‘'- \a+j Y = (® ]^){a+j- tY 

and it appears that tlio cumulative numbers could be defined by (91 A useful 
recursion formula has been derived from (9) 

( 1 <^) + = (a + a:)V* (a + + (s + 1 - a - x)V’ {a + a; - l)-b 

4 . The cumulative polynomials. We define the cumulative polynomials to 

be the polynomials obtained by using the cumulative numbers as coefficients 
Thus when a = 0 , 

Pi = y] jF >2 = y + ?/; Pi = y A- V + = y + lly' + lly' + y‘; etc. 

It is possible to derive a recursion formula for these polynomials. We use 

(10) with s replaced by s + 1 and a == 0 and get 

(11) P, 4 , - + 2(s + 2 - 

which becomes, after some manipulation, 

( 12 ) P.+i = (1 - y) 2 xV*+‘^V + {s + l)yP., 

To illustrate we get from == y + i.y'‘ + y\ Now = y + 

+ 3 y’ and = (1 - y){y + 8y’ + 3 y*) + 4 y(y + 4 ?/ + y”) ^ + lly“ + 

4 y’ + y\ The recursion formula ( 12 ) can be expressed also in the form of a 

differential equation, since P> = ^ (P.) = 2 a:V‘■'■^^‘y*~^ as 

(13) ^^.+1 = y[(l - y)P.' + is + 1 )P.]. 

It dan be shown more generally that for any a 

Pa,o = 1; Pa,i == a + by; Pa.j = a* + (2ab + l)y + bV, etc. with 

( 14 ) = y(l - y)p^_, + [a(l _ y) + (s + 
as the recursion formula. 


5. The numerator coeffleients in successive derivatives of the logistic function. 
Lotka has recently exhibited the coefficients of the numerator terms of sue- 
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cessive derivatives of the logistic function [2, 160], These appear to be, aside 
from sign, the same as the cumulative numbers when a = 0. It is shown in 
this section that these numbers are the cumulative numbers. The scheme is 
generalized to include the numerator coefficients of the derivatives of a more 
general function involving the parameter a. 


Lotka used the function $o = 


re [I — e ) 
(1 -h e")’ 


1 -F- e'‘ 


and obtained = 


re 


(1 + 6 ^')' 


, = 


, etc. The numerical coefficients are the same if r = I so we might 


as well use 


1 

1 + e*' 


A more general function is the two parameter function 


(16) 




1 -|- ce*‘ 


Let successive derivatives with respect to x be indicated by ^a.o.i ; $a,o,j ; $a,o,a ; 
etc. Then 


^ _ e“*[o + c(l — a)e*] 

(1 + ce*)» ' 

^ e"[a^ + (-2a^ + 2o + l)ce^ + (1 - o)^c^^] 
(1 + ce*)* ——~ 


In general, 


so that 




e°U.c,. 

, (1 + ce*)*+^ 


+ ce')"-’ 


$ , _ g 1(1 4~'Ce )[oQa.c.« + Qn.c.i] — (s -b l)ce’’Qa.c,»i 

(1 + ce*)*+* 

and 

(16) Q»,a,.+i = (1 + ce*)[aQa,.,. + Qi,,.,] - (s + l)ce*Q„.„.,. 

The Q functions can be changed to polynomials with the substitution e* = y. 
Then derivatives are taken with respect to y and 

(17) P«,a,.+i = (1 + cy)[oP..,., + j/Pi,,,.] - (j + l)C 2 /Pa,... . 

\J^en c = -1, this becomes formula (14) and smce P..o = 1, it follows that 
the numbers of the present section are generalized cumulative numbers, When 
c = 1 and a = 0 we have the numbers found by Lotka. 

It can be shown, further, that the c coefficient of y^ is cK It follows that the 
absolute values of the coefficients, when c = 1 and when c = -1, are the same. 


6. Formulas for 2a:'. A formula for the sums of the s-th 
mtegers from 1 to A; is obtained by su mmin g ( 3 J_ gg^ 


powers of the 




For example 

^ j _ (fc + 2)<'> + (fc + 1)® _ kik + l)(2fc + 1) 

r® 3l 6 ' 

4 , 3 _ (fc + 3)<‘> + 4(fc + 2)«> + ik + 1)‘‘> _ A*(A; + D' 
r 41 4 • 

■> a+Jfe 

More generally the values of ^ a;’ can be evaluated by 

d 

0+A; -it i 

(21) E^’ = 7-^, L (fc + s - i)''+‘V*+‘(a + j)* = E C5ti(l)V+^(a + V)*• 

a (S + 1) ! )-0 - I-O - 

7, Summary. It is shown how the cumulative numbers and the cumulative 
polynomials may be obtained in a variety of ways. Of special interest is the 
fact that the cumulative numbers can be obtained by expanding powers in 
terms of factorials and hence they might be called factorial coefficients of a 
kind. It is also possible, though it is not within the scope of thip paper, to 
establish interesting relations between the cumulative numbers and the multi¬ 
nomial coefficients, the usual factorial coefficients, the difference of 0, etc. 
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ENUMERATION AND CONSTRUCTION OF BALANCED INCOMPLETE 
BLOCK CONFIGURATIONS^ 

By Geetbtob M. Cox 

1. Inti’oduction. One of the general problems of experimental design is to 
avoid extraneous effects in making desired comparisons. The method employed 
is to use experimental materials as nearly homogeneous as possible. Such 
materials^ howeverj are seldom available in large quantities, On the contrary, 
field soils vary in fertility from block to block, animals vary with both litter and 
sex, and leaves on one young plant differ from those on another. Differences 
between blocks, between litters and sex, and between plants, being irrelevant 
to the comparisons usually contemplated, must be avoided. 

When the number of treatments to be compared is small, well known methods 
of design, such as the Latin square or randomized complete block, arc available 
and efficient. As the number of treatments increases, however, these designs 
tend to become less efficient through failure to eliminate heterogeneity. Fur¬ 
thermore, they become cumbersome, the Latin square design requiring replicates 
equal in number to the treatments and the complete block design providing that 
each treatment occur in every block (Blocks arc defined as an assemblage of 
experimental units chosen to be as nearly alike as possible.) 

Because of such limitations, several modifications of the complete block design 
ha\’o been dei isod Ttu'se new designs all have the common, characteristic that 
tliotoxiKM’irnomal matiuifd i,s divided into groups or blocks containing fewer units 
than the number of treatments to be compared, These more homogeneous 
small blocks are referred to as incomplete blocks. 

It IS desirable to have all comparisons between pairs of treatments made with 
equal accuracy. This requires of the design that every pair of treatments 
occur m the same block an equal number of times. Such a design is referred to 
as balanced Balanced incomplete block designs can be arranged (for any given 
number of treatments) only for certain combinations of block size and number of 
replications.” 

The construction of balanced incomplete block designs is mathematically a 
part of the theory of configurations. A configuration is an assemblage of 
elen^ents into sets, each element occurring in the same number of sets, and each 

1A revision of an expository paper presented under a different title at a joint meeting 
of the Institute of Matheniatienl Statistics and Biometiic Section of the American Statisti¬ 
cal A-Ssoeiation, December 27,1930 

= Numerous additional designs arc available in the partially balanced incomplete blocks 
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set containing the same number of elements The configurations to be con¬ 
sidered here aic the complete configurations, i e., those in which each element 
occurs an equal number of times in the same set with ('very other element. It 
would be useful to know, (a) what configurations (within the useful range) 
exist. (Ii) liow these (ionfigurations may be constructed. 

The typical requirciinent of the experimenter is thi.s; “I wi.sh to test i treat¬ 
ments and can use blocks of size A(Z > k). I should like a design which will 
involve as little experimental material as feasible ” The designer must then 
determine what configuration of i elements in sets of k will satisfy the incidence 
relation that each pair of elements occur together m a set an equal number of 
times, and for which the total number of sets is a minimum. There are still 
many configurations which the experimenter needs but which have not as yet 
been constructed. 

In order better to explain the construction of these balanced incomplete block 
designs, it is essential to specify the underlying combinatorial problems. A 
configuration satisfying the condition of balance can be obtained by writing 
down all possible combinations, b, of the i elements taken k at a time, 

k\(t~k)l 

The simple,st example is that in which each set contains only two elements and 
all possible combinations of tlic I elements, taken in pairs, appear in the different 
sets. This series of pairs can be written out by the experimenter, and the 
method of analysis is given by Yates [20]. 

Let us take another example; given six elements to be taken three at a time, 

6 = .ft . 

The 20 combinations are, 


m 

134 

U6 

236 

345 

m 

135 

156 

245 

346 

125 

136 

234 

246 

356 

126 

145 

235 

256 

456. 


Such unreduced designs are not necessarily economical or feasible in experimental 
work. It is often desirable to find some less extensive configuration In this 
example half of the combinations, either those in italics or the other half, fulfill 
the restriction that every element occur with every other element in the same 
number of sets. Each pair of elements occurs twice in either group of sets, 
Thus, a balanced incomplete block design can be based on either half of the 
20 sets as well as on all 20. 

2. Combinatorial methods. Combinatorial considerations of a simple nature 
enable us to set up necessary conditions which balanced designs must satisfy. 
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We have t elements arranged in 6 sets of k elements each; each element occurs in r 
sets, and each pair of elements occurs together in a set exactly X times. Then 
we must have 

tr = hk, r{k - 1) = X(< - 1). 

The first of these equations expresses the fact that the total number of plots 
must be equal both to the product of elements by replications and to the product 
of sets by number of elements per set; the second, that the number of pairs into 
which a given element enters must equal X times the remaining number of 
elements. 

It is convenient to write 


X« - 1) , _ \i(t - 1) 

fc - 1 ’ &(fc - 1) ■ 


Since th,e numbers t, h, r, k, X must be integers, it is easy to obtain lower limits 
for any three in terms of the other two. 

To give a general classification, the configurations have been divided into 
classes according to the value of X. Because of the practical limitations in 
experimentation, table I has been expanded only to include X = 6 and the k 
values from 1-14. It may be well to call 'attention to the fact that duplications 
occur in the different classes of table ,!. For instance in the class, X = 1, for 
fc = 6, f = 15m + 1, and m = 1, then 6 = 8, and r = 3. In order to construct a 
design, the following condition is necessary; r > fc and therefore b > t. In this 
example, the condition is mel; if 6, f and X are multiplied by 2, the resulting design 
is f = 16, 6 = 16, r = 6, fc = ‘6 and X = 2. Thi^ configuration is a duplicate 
of the design in the class, X = 2, for fc = 6 and m = 1. In many of the con¬ 
figurations where X is 3, 4,5, or 6, a common factor can be cancelled from 6, r and 
X giving a design listed in the classes , X = 1, 2 or 3. 

It should be emphasized that the conditions under which table I was derived 
are necessary, but not sufiicient, for the existence of a complete configuration. 
For example, consider the following configurations which satisfy the necessary 
conditions for a design. 


Sub class 
(table I) 

m 

t 

10m -f 6 

1 

15 

21m-H 1 

1 

22 

16m -1- 6 

2 

36 

42m -1- 1 

1 

43 

45m -b 10 

2 

100 

110m -b 1 

1 

111 


6 

r 

k 

X 

21 

7 

5 

2 

22 

7 

7 

2 

42 

7 

6 

1 

43 

7 

7 

1 

110 

11 

10 

1 

111 

11 

11 

1 


No configurations of the above specification can actually be constructed. 

A selected group of configurations from table I is given in table II Only 
those configurations whose fc, r and X Ue within practical limits, and whose 
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existence has not been disproved, have been included The practical limits of 
/c, r and X, of course, arc dependent upon the conditions surrounding the experi¬ 
ment We have chosen to keep k within the range 3 to 10 except for a few special 
configurations in which t is greater than 100, in which cases k was allowed to 
equal 11-14. Also r has been kept within a similar limited range, (Those 
configurations in table II, with an asterisk preceding t, have not been con¬ 
structed.) 

The above limitations upon h and r give a small, selected group of configura¬ 
tions. However, many others either have been constructed or arc known to 
exist. For balanced incomplete block designs, Yates [20] gives the lower limits 
of r for t from 4 to 25 and k from 2 to 12 but not greater than Fisher and 
Yates [8] have tabulated the configurations which are known to exist having 
ten or less replications including all arithmetically possible configurations the 
existence of which has not been disproved 

Even if the existence of a configuration has not been disproved, there s till 
remains the difficult problem of writing out the elements which arc to appear in 
each set Some discussion of the structure of such configurations is presented 
by Fisher and Yates [8] by Yates [20, 21] by Goulden [9, 10] and by Bose [4]. 
Additional descriptions are to follow. 

While a search of the literature revealed a number of constructed configura¬ 
tions, yet the general theory of their formation has received relatively little 
consideration. The question of combinations related to the theory of configura¬ 
tions which is of interest here was first set forth by Kirkman [11] in 1847 He 
states the problem thus: "If Q* denote the greatest number of triads that can be 
formed with x symbols, so that no duad shall be twice employed, then 

3Q. = x(x - l)/2 - 7. 

if for 7x we put 0, when a: = 6m -[- 1 or 6m -|- 3.” This gives the formula for b 
which was given earlier in this article Put x = t and 7* = 0 

A _ n _t{t- 1) 

3 2 k{k - 1) ■ 

Besides the theory connected with these combinatorial problems, considerable 
information related to the construction of the configurations has been found in 
the literature on finite projective geometry, especially the geometry which applies 
to the theory of groups. 

An extensive discussion of the X = 1 class of configurations (as listed in table I) 
can be found in the literature. The theory of the formation of the configurations 
for the sub-class i = 6m -f 3 has been summarized by Ball [1]. This is the 
Kirkman "school-girl problem” for which Eckenstein [7] lists 48 papers and 5 
books wiitten during the years 1847—1911 dealing with this subject. The 
problem was first published in the Lady’s and Gentleman’s Diary for 1850 [12]. 
It is usually stated that "a schoolmistress was in the habit of taking her girls 
for a daily walk. The girls were fifteen in number, and were arranged in five 
rows of three each, so that each girl might haVe two companions. The problem 
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is to dispose of tlu'in so that for sovc'ii ponsocutivp days no giil will walk with any 
of her school-fallows in any triplet more than once.” For this particular sub¬ 
class (t = 6m -h 3, A; = 3), this type of configuration has been shown to exist 

TABLE II 

liclccLcd Group of Configurations 


(Hiilaripcfl lunompleto Block llcsiRna) 


1 

b 

r 

k 

X 

1 

h 

r 

k 

X 

7 

7 

3 

3 

1 Y.S ‘ 

*25 

50 

8 

4 

1 

7 

7 

4 

4 

2 

25 

30 

6 

5 

1 

8 

14 

7 

4 

3 

25 

15 + 15 

3 

6 

1 LS. 

9 

12 

4 

3 

1 

*25 

25 

9 

9 

3 

9 

6 + 6 

2 

3 

1 L.S.* 

28 

63 

9 

4 

1 

9 

18 

8 

4 

3 

28 

36 

9 

7 

2 

9 

18 

10 

5 

5 

♦29 

29 

8 

8 

2 

9 

12 

8 

6 

5 

31 

31 

6 

0 

1 Y.S. 

10 

30 

9 

3 

2 

*31 

31 

10 

10 

3 

10 

15 

0 

4 

2 

*36 

45 

10 

8 

2 

10 

18 

9 

6 

4 

37 

37 

9 

9 

2 

10 

15 

9 

0 

5 

*41 

82 

10 

6 

1 

11 

11 

5 

5 

2 

*40 

69 

9 

6 

1 

11 

11 

6 

0 

3 

♦46 

46 

10 

10 

2 

13 

26 

6 

3 

1 

49 

56 

8 

7 

1 

13 

13 

4 

4 

1 YS. 

40 

28 + 28 

4 

7 

1 L.S. 

13 

13 

9 

9 

0 

. *61 

86 

10 

6 

1 

15 

35 

7 

3 

1 

67 

57 

8 

8 

1 Y.S. 

15 

16 

7 

7 

3 

61 

72 

0 

8 

1 

15 

16 

8 

8 

4 

04 

72 + 72 

0 

8 

2 L.S 

16 

20 

5 

4 

1 * 

73 

73 

0 

9 

1 Y.S. 

16 

20 + 20 

5 

4 

2 I,.H 

81 

90 

10 

0 

1 

16 

16 

6 

0 

2 

81 

46 + 46 

6 

9 

1 L.S 

16 

10 

10 

10 

0 

91 

91 

10 

10 

1 Y.S 

19 

57 

9 

3 

1 

121 

132 

12 

11 

1 

19 

19 

9 

9 

4 

121 

00 + 06 

6 

11 

1 L.S 

19 

19 

10 

10 

5 

133 

133 

12 

12 

1 Y.S. 

21 

70 

10 

3 

1 

169 

182 

14 

13 

1 

21 

21 

6 

6 

1 Y.S. 

169 

91 + 91 

7 

13 

1 L.S 

*21 

28 

8 

6 

2 

183 

183 

14 

14 

1 Y.S. 

•21 

30 

10 

7 

3 







•Have not boon construoled. 
lYouden squares. 

1 Lattice squares. 


foi every possible value of 1 . Most of the solutions were worked liy H. E. 
Dudency and 0. Eckcn.stein. They are given liy Ball [1] for all t's less than 100, 
that is, for i = 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93 and 99. 
Ball dcscribc.s several methods of coinstructing such configurations, as cycles, 
combination.s of cycles, scalene triangles inscribed in the circle, focal and analyti- 
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cal methods. As an illlustration of the school-girl problem, the construction 
of the configuration for i = 9, & = 12, r = 4, A = 3 and X = 1 will be shown. 
Scalene triangles are inscribe(’ in a circle with certain specifications (to be 
fulfilled) giving the three sets of triplets for the first day as follows, 

Set Group I 

( 1 ) 1 6 

(2) 3 4 6 

(3) 7 8 2. 

By rotation or by cyclic substitution the other three groups are secured: 


Set 

Group 11 


Group III 


Group IV 

(4) 

A: 2 6 

(7) 

CO 

(10) 

k 4 

8 

(5) 

4 5 7 

(8)' 

5 6 8 

(11) 

6 7 

1 

(6) 

8 1 3, 

(9) 

1 2 4, 

(12) 

2 3 

5. 


Then piacmg Ai = 9, we have the configuration for f = 9, 6 = 12, and r = 4. 
Note that in the school-girl problem the sets are grouped into complete replica¬ 
tions of the elements. This problem Of 9 girls taken 3 at a time has been sub¬ 
jected to an exhaustive examination. There are 840 arrangements but only one 
fundamental solution In the case of 16 girls, the number of fundamental 
solutions according to Mulden [14] and Cole [6], is seven. Ball mentions the 
Kirkman problem in quartets which is the sub-class t = 12m -|- 4, for A: = 4. 
He states that this has been solved for cases where m does not exceed 49. He 
also states, "I conjecture tliat similar methods are applicable to corresponding 
problems about quintets, sextets, etc." 

Before leaving the school-girl problem, an illustration will be given of i = 28, 
6 = 63, r = 9, A; = 4 and X = 1. The following framework was set up by Dr. 


C. P. Winsor using suggestions from Netto [15]. 


k 

a 

b 

c 

Ol 

as 

b, 

hi 

' 02 

a-i 

bi 

h 

Os 


C4 

Cs 

Os 

as 

Cl 

Ca 

62 

67 

Cl 

Cl 

hi 

65 

Cl 

C7. 


a, b and c each have every internal difference once and only once; and each pair 
o-b, o-c and b-c must have every external difference once and only once. The 

nine groups are given in table III. The cyclic substitution is within three sets, 
o, bandc. That is. 
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in group I, 

a = 1, 

Oi = 2, 

02 = 3, • • 

■ , Oj != 9 

in group 11, 

a = 2, 

Oi = 3, 

02 = 4, • • 

■ , 08 = 1 

in group III, 

0 = 3, 

Oi = 4, 

02 ~ 5, * • 

• , 08 = 2 


etc. 


Netto [15] discusses t elements in sets of h, every set of 2 elements to occur 
together in a set exactly X times. He deals with X = 1, and gives a discussion 
of both sub-classes when = 3, that is, for t = 6m -f 1 and < = 6m -f- 3. Reiss 
[16] and Moore [13] have proved that configurations can be constraa|;ed for all 
values of i if fc = 3. This is the type of information which is valuable in answer- 


TABLE III 

Configuration /or 1 = 28, b = 83, r = 9, I: = 4, X = 1 


Group I Group II Group III Group IV 


k 

a 

b 

c 

28 

1 

10 

19 

28 

2 

11 

20 

28 

3 

12 

21 

28 

4 

13 

22 

0,1 

ag 

% 

b, 

2 

9 

13 

16 

3 

1 

14 

17 

4 

2 

16 

18 

6 

3 

16 

10 

CLi 


bi 

bi 

3 

8 

11 

18 

4 

9 

12 

10 

6 

1 


11 

6 

2 

14 

12 

a. 

Ol 

Ci 

Cl 

4 

7 

23 

24 

6 

8 

24 

25 

6 

9 


26 

7 

1 

26 

27 

Ol 

Ol 

Cl 

Cl 

6 

6 

20 

27 

6 

7 

21 

19 

7 

8 

22 

20 

8 

9 

23 

21 

h 

b, 

Cl 

Cl 

12 

17 

22 

will 

13 

18 

23 

26 

14 

nil 

24 

27 

16 

11 

26 

19 

bi 

bi 

C] 

Cl 

14 

16 

21 

26 

la 

la 

22 

27 

16 

17 

23 

19 

17 

18 

24 


Group V 


Group VI 


Group VII 

Group VIII 


Group IX 

28 

6 

14 

23 

28 

6 

16 

24 

28 

7 

16 

26 

28 

8 

17 

20 

28 

9 

18 

27 

6 

4 

17 

11 

7 

5 

18 

12 

8 

6 

10 

13 

0 

7 

11 

14 

1 

8 

12 

16 

7 

3 

16 

13 

8 

4 

16 

14 

9 

6 

17 

16 

1 

6 

18 

10 

2 

7 

EEl 

17 

8 

2 

27 

19 

9 

3 

19 

20 

1 

4 

20 

21 

2 

6 

21 

22 

3 

■1 

22 

23 

9 

1 

24 

22 

1 

2 

26 

23 

2 

3 

26 

24 

3 

4 

27 

26 

4 

6 

19 

26 

16 

12 

26 

20 

17 

13 

27 

21 

18 

14 

El 

22 

m 

16 

Kil 

23 

11 

16 

21 

24 

18 

10 

26 

21 

B 

11 

26 

22 

11 

12 

27 

28 

12 

13 

19 

24 

18 

14 


26 


ing the first question in the introduction of this article; "what configurations 
exist?" Carmichael [5] mentions the quadruple systems 6m,+ 2 and 6m + 4 
and states that the general problem of their existence appears not to have been 
solved. Also for the higher values of fc there seems to be very little known of 
any generality, but it is known that fOr A; > 3 there are certain configurations 
which are not possible. 

3. The method of geometrical conflguratioa. Another aid in the construction 
of balanced incomplete block designs is found in some of the finite projective 
geometries. These are described by Carmichael [6]. A tactical configuration 
of rank two is defined as a combination of I elements into m sets, each set con¬ 
taining X distinct elements, and each element occurring in n distinct sets. 
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I = it) = number of points in the geometry, 
m = (b) ~ number of lines, 

X = (fc) = m mber of points, 

M = (r) = number of lines on a point. 


The series of finite projective geometries P0(k, p") for k > 1 furnishes a 
certain infinite class of these tactical configurations. The following list gives 
those which have been incorporated in the list (table 11) of useful balanced 
incomplete block designs. 

Two dimensional space, PG(2, p") 


pn 

m 

m(5) 

m 

n(r) 

2 

7 

7 

3 

3 

3 

13 

13 

4 

4 

2' 

21 

21 

5 

5 

5 

31 

31 

6 

6 

7 

57 

57 

8 

8 

2 " 

73 

73 

9 

9 

3' 

91 

91 

10 

10 

11 

133 

133 

12 

12 

13 

183 

183 

14 

14. 


Three dimensional space, PG(3, p") 


pfl 

1 

m 

X’ 


2 

15 

35 

7 

3. 

From the Euclidean geometry EG{k, p”) for k 

> 1 other tactical configurations 

can be constructed. 

These are formed from the PG{k, p") by omitting a given 

line from the two dimensional space and a plane from the throe dimensional 

space configurations 

Some of the resulting designs are: 



Two dimensional space, EG{2, p") 



1 . 


X 

n 

2 

4 

6 

3 

2 

3 

9 

12 

4 

3 

2' 

16 

20 

5 

4 

5 

25 

30 

6 

6 

’ 7 

49 

56 

8 

7 

2’ 

64 ' 

72 

9 

8 

3" 

81 

90 

10 

9 

11 

121 

132 

12' 

11 

, 13 . 

169 

182 

14 

13. 
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Methods are available for constructing the two dimensional space PG^k, p’') 
and the corresponding EG(k, p”) configurations where p is a prime number. 
This being true, we can also construct the completely orthogonalized squares 
from the EO{k, p") geometry. The reveme situation in which these configura¬ 
tions are constructed by using the Completely orthogonalized squares is to be 
illustrated. The.se squares con.sist of superimposed Latin squares, fulfilling the 
condition that each number from the second Latin square occurs once and only 
once with each number in the first Latin square. As an exahiple take the two 
Latin squares: 

Latin Square I Latin Square II 

12 3 13 2 

2 3 1 2 13 

3 12, 3 2 1. 


Superimpose square 
3x3 square, 


II upon 

Sfiuarc I to 

get the completely orthogonalized 

11 

23 

32 

22 

31 

13 

33 

12 

21. 


The first number in each cell is a value from square I; the second number in each 
coll is from square II. Note that the numbers in the second place in each cell 
occur once and only once with each of the fimt numbers, that is 1-1,1-3, and 1-2. 
The completely orthogonalized squares have been proven to exist for all prime 
numbers and for powers of prime numbers. The solution of this problem was 
secured independently by Bose [2] and by Stevens [18]. Those of sides 2, 2“, 2“, 
2', 2\ 2\ 3,3”, 3^ 3', 5, 5', 5', 7, f, 11 and 13 have been given. 

The completely orthogonalized 3x3 square may be used to construct 


11 

1 

23 

4 

32 

7 

22 

2 

31 

6 

13 

a 

33 

3 

12 

6 

21 

9 


a balanced incomplete block design. The italic numbers, which follow the 
cell numbers, designate the 9 elements which are to be arranged m fpur groups of 
three sets. Group I is formed by placing the elements from each row into sepa¬ 
rate sets, m group II the elements from the three columns are placed in three 
sets; in group III the first set (7) consists of the elements which follow 1 in the 
first place in the cells, act (8) consists of the elements which follow 2 in the first 
place in the cells; and group IV is assembled in the same way as group III except 
the numbers in the second place in the cells are used to select the elements for 
each set. Thus we have the configuration; 
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Group I 
Set (rows) 

(1) 1 4 7 

(2) 2 5 8 

(3) 3 6 9 


Group II 
(columns) 

(4) 1 2 3 

(5) 4 5 6 

(6) 7 8 9 


Group III 
(first place) 

(7) 1 6 8 

(8) 2 4 9 

(9) 3 5 7 


Group IV 
(second place) 

(10) 1 5 9 

(11) 2 6 7 

(12) 3 4: 8 


In the 12 sets of 3 elements, each of the 9 elements occurs with every other 
element once and only once in a set. 

This IS an illustration of one aeries of configurations which can be constructed 
with tile aid of the completely orthogonalized squares. These are the EG{k, p") 
in two dimensional space when ;c = 2 and p” = 2, 3, 2^, 5, 7, 2®, 3^, 11, 13, . . . 
The PG{k, p") configurations can be written by adding (k + 1) elements 
to the previous group of configurations. For example, the elements 10, 11,12 
and 13 may be added to the groups, one to each group. That is, 10 is added to 
each set in group I, 11 is added to each set in group II, 12 to group III and 13 to 
group IV. An additional set must be added to include these four new elements. 
A configuration for < = 13, 6 = 13, fc = 4, r = 4 and X = 1 results. 

Set 


(1) 

1 4 7 10 

(4) 

1 2 3 

11 

(7) 

1 6 8 

12 

(10) 

1 

6 

9 

13 

(2) 

2 5 8 10 

(5) 

4 5 6 

11 

(8) 

2 4 9 

12 

(11) 

2 

6 

7 

13 

(3) 

3 6 9 10 

(6) 

7 8 9 

11 

(9) 

3 5 7 

12 

(12) 

3 

4 

8 

13 


(13) 10 11 12 13. 


The 13 sets are made up of 4 elements each. These designs are symmetrical 
for sets and elements, that is, every pair of elements occurs together in the same 
number of sets, also, every pair of sets has the same number of elements in 
common. Discussion of the construction of these designs with illustrations arC 
given in references [20, 8, 9] and [19]. 

In the PG(ic, p") series of designs, as constructed by means of completely 
orthogonalized squares, the sets cannot be arranged in replication groups. How¬ 
ever, these configurations can be arranged in Youden squares [22] in which all 
the sets are placed side by side and all the elements in a single row form a com¬ 
plete replication. This method of arrangement has been of considerable value 
in experimentation with plants. The Youden squares are the PG{k, p") when 
K — 2. Singer [17] gives a partial list of the (reduced) perfect difference sets 
(table IV), only a single set for each p". The number of distinct perfect differ¬ 
ence sets (or,the number of distinct perfect partitions) for a given p" is equal to 

Since each perfect difference set can be paired with its inverse, the 
number is even. 

The construction of one of the Youden squares from its perfect difference set 
will be illustrated. Consider p" = 3 then g = p“" -|- p" -|- 1 = 3“ -[- 3 1 = 13. 

There are two perfect difference sets with their inverses for g = 13. One perfect ' 
difference set is 0, 1, 3, 9 which has the perfect partition 1, 2, 6, 4 which will 
add in succession to each number from 1 to and including 13, and also 1, 2, 6,4 
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add to 13. The elements of the perfect difference set arc put in set (1) except 
that 13 replaces 0. Set (2) is secured by a one-step cyclic substitution, 1 for 
13,2 for 1,4 for 3 and 10 for 9. This process is continued until there are thirteen 
sets, If the substitution is applied to set (13), the elements in set (1) are secured. 






Set 






(1) 

(2) 

(3) 

(4) (6) 

(6) (7) 

(8) 

(9) 

(10) (11) (12) (13) 

Replica- A 

13 

1 

2 

3 

4 

5 6 

7 

8 

9 10 11 12 

tion B 

1 

2 

3 

4 

5 

6 7 

8 

9 

10 11 12 13 

C 

3 

4 

6 

6 

7 

8 9 

10 

11 

12 13 1 2 

D 

9 

10 

11 

12 13 

1 2 

3 

4 

5 6 7 8. 

This is the Youden square for i 

= 13, b 

= 13, r 

= 4, 

k = 

4, and \ = 1. The 


elements in each row form a complete replication. 


TABLE IV 


Singer's list of perfect dtfference sets 


vil) 

p" q 3" Perfect difference set 


2 

7 

2 

0 

1 

3 












2* 

21 

2 

0 

1 

4 

14 

16 










2> 

73 

8 

0 

1 

3 

7 

15 

31 

38 

54 

63 







273 

12 

0 

1 

3 

7 

15 

31 

63 

90 

lie 

127 

136 

181 

194 

204 

3 

13 

4 

0 

1 

3 

9 











3* 

91 

12 

0 

1 

3 

9 

27 

49 

58 

61 

77 

81 





5 

31 

10 

0 

1 

3 

8 

12 

18 









7 

57 

12 

0 

1 

3 

13 

32 

36 

43 

52 







11 

133 

36 

0 

1 

3 

12 

20 

34 

33 

81 

88 

94 

104 

109 



13 

183 

40 

0 

1 

3 

16 

23 

28 

42 

76 

82 

86 

119 

137 

154 

176 


t = g = pS" + p" + 1 


A third series of configurations, called Lattice squares or quasi-Latin squares 
[21] can be constructed by using the completely orthogonalized squares. The 
groups of sets on page 78 are taken in pairs. For each pair a square is constructed 
having its rows formed by the sets of one group and its columns by the sets of 
another group. For example, square I below is made so that the sets of group I 
form the rows and the sets of group 11 form the columns. Square II is the 
combination of groups III and IV. 

Square I _ Square II 


1 

4 

7 

2 

5 

GO 

CO 

6 

9 


1 

6 

00 

9 

2 

4 

■ 6 

■ 7 

3 
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la this lattice square each pair of elements occurs together once only in either a 
row oi: a Column of either one of the squares Also, every clement occurs with 
every other clement once in one column and one row from each square. 

A device known as "complements” gives several configurations. From an 
arrangement having Ic 7 ^ ai, a second one can be obtained for the same number 
of elements, in sets oi i - h units This is done by replacing each sot by its 
complement, that is, by a set containing all the elements missing from the 
original set. An illustration follows- 


i 

= 7, 

h 

= 7 



i = 7, 

b = 

7 


r 

= 3, 

k 

= 3 



r = 4, 

k = 

4 



X = 

1 




X 

= 2 



Set 





Set 





(1) 

1 


2 

4 

(1) 

3 

5 

6 

7 

(2) 

2 


3 

5 

(2) 

1 

4 

6 

7 

(3) 

3 


4 

6 

(3) 

1 

2 

5 

7 

(4) 

4 


5 

7 

(4) 

1 

2 

3 

6 

(5) 

5 


6 

1 

(5) 

2 

3 

4 

7 

(6) 

6 


7 

2 

(6) 

1 

3 

4 

5 

' (7) 

7 


1 

3, 

(7) 

2 

4 

5 

6, 


■While the triple systems, quadruple systems, etc., which have been con¬ 
sidered by some mathematicians, do furnish designs meeting the balance re¬ 
quirements, they are usually not suitable for experimental purposes A quad¬ 
ruple system requires that every possible triple of elements occur once and only 
once together in a block. Since-we need only every pair together once (X = 1) 
or more, only the triple systems are generally useful. 

4. Summary. The mathematical theory of configuration has been helpful 
in the construction of the balanced incomplete block designs It would be use¬ 
ful to know (a) what configurations (within the useful range) exist, (b) how these 
configurations may be constructed. In table I the configurations have been 
clas.=iified according lo the value of X, while-in table II configurations within a 
useful range have been listed. Of the designs in this table which have not been 
constructed, some are known to exist. Those aids which have been used in the 
construction of the balanced incomplete block designs have been briefly dis¬ 
cussed. 
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A COMPARISON OF ALTERNATIVE TESTS OF SIGNIFICANCE FOR 
THE PROBLEM OF m RANKINGS' 

By Milton Friedman 

A paper published in 1937 [2] suggested that the consilience of a number of 
sets of ranks can be tested by computing a statistic designated x? ■ A mathe¬ 
matical proof by S. S. Wilks demonstrated that the distribution of xl approaches 
the ordinal y distribution as the number of sets of ranks increases The 
rapidity with which this limiting distribution is approached was investigated by 
obtaining the exact distributions of x? a number of special cases. It was 
concluded that ‘'when the number of sets of ranks is moderately large (say 
greater than 5 for four or more ranks) the significance of x? can be tested by 
reference to the available x^ tables” [2, p 695], The use of the normal distribu¬ 
tion was recommended when the number of ranks in each set is large, but the 
number of sets of ranks is small, although no rigorous justification of this pro- 
ebdure was presented. 

Except for the few special cases for which exact distributions were given, the 
paper did not provide a test of significance for data involving less than six sets of 
ranks and a small or moderate number of ranks in each set. This important 
gap has now been filled by M. G. Kendall and B Babington Smith [1]. In 
addition, thev farni'-h a somewhat more exact test of significance for tables of 
ifinks foi whicli th(' curlier article recommended the use of the x"* distribution. 

Kendall and Smith use a diffi'icnt statistic, W, defined as Xr divided by its 
niaxiinuui \alu(', vi[n — 1), where n is the number of items ranked, and m the 
number of .-cts of iiiiily “ The new statistic (independently suggested by W. 
.■Vilen Wallis pll who Terms ii the lank correlation ratio and denotes it by ril) is 
thus not hiiidamt'iitally diflercnt from Xr- A more radical innovation is the 
improvement in the test of significance that they suggest. Instead of testing 
Xr by reference to the x'' distribution forn — 1 degrees of freedom, Kendall and 
Smith, generalizing from the first four moments of W, recommend that the 
significance of W be tested by reference to the analysis of variance distribution 

(Fisher’s g-distribution) with? = m = (n - 1) - ^ ,ni = 

^ \ 1 — W / m 

2 

(wi — 1) {n — ~ . For small values of m and n, they introduce con- 

1 The author is indebted to Mr. W. Allen Wallis for valuable oritioisin and to Miss Edna 
R, Ehrenberg for computational assistance 

“ This IB Kendall and Smith’s notation which will be used in the present paper. The 

original paper [2] designated the number of items ranked by p, and the number of sets of 
ranks by n, , 
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12S 

tinuity corrections, substituting for W = —^, tiie statistic 

12 


Tfc = - 


w — _ 

S — 1 _ m^(n^ — n) 


m^(n^ — n) 
12 


+ 2 1 + 


24 


— n) 


where S is the observed sum of squares of the deviations of sums of ranks from 
the mean value, m(n + l)/2 Comparison witli exact distributions of W (or S) 
for special cases indicates that this test yields very good approximations to the 
correct probabilities, 

In the limit the two tests of significance arc identical. Neglecting the 

, 1 / im - 1 )y^ \ 1 / \ 

correction for continuity, 3 = 2 ^ 2 j’ ^ 


(in - 1) I 

?l2 = “, 
of i log. 


(n - 1) - 


m 


M, andni = (n — 1) -- 

m 


(n — 1) as m ■ 


For' 


the analysis of variance distribution is identical with the distribution 
2 

—. The difference between the two tests is thus that one, x^ uses 

fii 


a single (limiting) distribution for all values of m, whereas the other, 2 , adapts 
the distribution to the value of m. 

The necessity of taking into account the value of m, while it increases the 
flexibility of the distribution, makes the 2 test somewhat less convenient in 
practice than the l^^at. Additional computation is required to obtain the 
values of wi and na, and to make the continuity corrections. It is also fairly 
laborious to test the significance of the result, if exact values of 2 at any level of 
significance arc required. In these instances, two-way interpolation of recip¬ 
rocals in the analysis of variance tables is necessary since both ni and rij are 
always fractional. ■ These difficulties make it desirable to investigate the rapidity 
with which the significance levels given by the 2 test approach those given by the 
X* test, and thus determine the range of values of m and n for which the simpler 
test can safely be employed. This investigation will yield as a by product the 
.05 and .01 significance values of x? (or W or S) for selected values of m and n as 
determined by the 2 test. 

Table I presents a summary comparison of the values of x? 8>t the ,05 and .01 
levels of significance as shown by (1) exact distributions, (2) the 2 test with 
continuity corrections, (3) the x° test.’ The significance values are expressed in 
terms of Xr rather than W because, for a given number of ranks per set (i.e,, a 
given n), the significance values given by the x tost arc the same regardless of the 
number of sets of ranks (i.e., of the value of m). This would not be so if W 
were employed, since W = xr/^ ~ !)• The expected value of W depends on 


® The values of xf computed using the * test that are given in Tables I and II wore ob¬ 
tained with the aid of Fisher and Yates’ Table V [4], Linear interpolation of reciprocals 
was employed throughout. 
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m and approaches zero as m —> m while the expected value of Xr is equal to n — 1 
for all values of m. 

The values given by the z test agree remarkably well with the exact values. 
With but two exceptions (the 01 values for n = 3, m = 8 and 10) the exact 
value differs very much less from the value given by the z tc.st than from the 
value given by the x^ test. In all but three of the 12 comparisons, the z test 
gives a value below the correct one 

TABLE I 


Comparison oj Values of x? at 05 and .01 Levels of Significance Yielded by Exact 
Distributions, z Test with Continuity Corrections, and yf Test 




.05 Level of Significance 

01 Level of Significance 



From Exact 

From 


From Exact 

From 




Distribution 

z test 


Distribution 

2 test 


n 

m 



with 

conti¬ 

nuity 

oorreo- 




with 

conti¬ 

nuity 

oorreo- 


Limits 

Ill - 

terpo- 

lated 

From 
X* teat 

Limits 

In¬ 

terpo¬ 

lated 

From 

test 




value* 

tions 



value* 

tions 


3 

8 

5 . 25 - 6.25 

6.16 

6.012 

5.991 


9.00 

8.35 ' 

9,21 


9 

6.0 - 6.22 

6 17 

6.004 

5.991 


8.67 

8.44 

9.21 


10 

5.6 - 6.2 

6.08 

5.999 

^ 5.991 

8.6 - 9,6 

9.04 

8.51 

9.21 


00 



5.991 

5.991 



9.21 

9.21 

4 

4 

7.5 - 7.8 

7.54 

7.43 

7 82 

9.3 - 9.6 

9.42 

9.21 

11.34 


5 

7 . 32 - 7.8 

7.54 

7.52 

7.82 

9 . 72 - 9.96 

9.87 

9.66 

11.34 


6 

7 . 4 - 7.6 

7.49 

7.57 

7.82 


10.00 

9.95 

11.34 


00 



7.82 

7 82 



11.34 

11.34 

5 

3 

8 . 27 - 8.53 

8.41 

8.59 

9.49 

9 . 87 - 10.13 

10.05 

10.08 

13.28 


QO 



9 49 

9.49 



13.28 

13.28 


* Computed by linear interpolation of probabilities. 


Table II gives for a very much larger number of values of m and n the .05 
and .01 values of Xr computed on the basis of the z test with continuity correc- 


^ These comparisons duplicate some of those made by Kondnll and Smith and merely 
serve to confirm their conclusion that the * test with continuity corrections gives exceed¬ 
ingly good results 

.. values obtained using the t test without continuity corrections agree less well with 
the exact values than those obtained with the aid of the continuity corrections However 

exiet a ^ corrections are made the s test in general yields values closer to the 

exact values than does the x* test. i 
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TABLE II 


Values of xr Levels of Significance Computed on the Basis of Kendall 

and Smith's z test, with Continuity Corrections; 10, .075, .02, .015 Values of 


n 


m 


3 


7 


1 

Values 

1 

at 05 Level 

of Significance 

3 



8.59 

9.90 

11 24 

4 


7.43 

8.84 

10.24 

11.62 

5 


7.52 

8.98 

10.42 

11.84 

6 


7.57 

9.08 

10.54 

11.97 

8 

6.012 

7.63 

9.18 

10.68 

12.14 

10 

5.999 

7.67 

9.25 

10.76 

12.23 

15 

5.985 

7.72 

9.33 

10 87 

12.36 

20 

5.983 

7.74 

9.37 

10.92 

12.42 

100 

5.987 

7.80 

9.46 

11.04 

12 56 

CO 

5.991 

7 82 

9.49 

11.07 

12.59 

xM.io) 

4.605 

6,25 

7.78 

9.24 

10.64 

(.075)* 

5.18 

6.90 

8.49 

10.00 

11.45 


Values at .01 Level of SiRnififiance 


3 

4 


9.21 

10.08 

10.93 

11.69 

12.59 

13.26 

14.19 

5 


9.66 

11.42 

13.11 

14.74 

6 


9.96 

11.74 

13.46 

15.09 

8 

8.35 

10.31 

12.13 

13.87 / 

16.53 

10 

8.61 

10.52 

12.37 

14.11 

15.79 

16 

8.74 

10.79 

12.67 

14.44 

16.14 

20 

8.85 

10.93 

12.82 

14.60 

16.31 

100 

9,14 

11.26 

13.19 

14.99 

16,71 

00 

9.21 

11.34 

13.28 

16.09 

16.81 

x“ (.02) 

7.82 

9.84 

11.67. 

13.39 

15.03 

X“ (.016)* 

8.40 

10.46 

12.34 

14,09 

16.77 


♦ Computed from Fisher and Yates’ Table IV (4) by linear interpolation between the 
logarithms of the probabilities. 
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tions, The values entered for m = oo are obtained from x tables for n — i 
degrees of freedom and are the significance values by the x test for all values of 
m . It IS apparent that as m mereases the 01 and .05 values of Xr approach their 
limiting values very rapidly For n = 7, two-thirds of the differenee between 
the 05 values for m = 3 and m = «>, and an even larger proportion of the 
differenee between the .01 values, disappears by the time wi = 10; and the 
situation is similar for the other values of n. Exeept for the .05 values for n = 3 
th^ approach to the limit is monotonic from below. The use of the test thus 
tends to lead to the overestimation of the significance values and of the probabili¬ 
ties attached to observed values of Xr - It is clear, however, that for large and 
even moderate values of m the test is, for all practical purposes, equivalent 
to the z test 


In order to determine more precisely the range of values of m and n for which 
the approximation given by the x^ test is adequate, it is necessary to adopt some 
convention about the error in estimated significance values of Xr that is tolerable. 
Since the conclusion drawn from an observed Xr depends on the probability 
that it will be exceeded by chance, this convention clearly should be expressed in 
terms of the error in the probability. 

The structure of published x“ tables makes it convenient to accept an estimated 
probability between .10 and .05 as a tolerable approximation to a correct prob¬ 
ability of 06, and an estimated probability between .02 and .01 as a tolerable 
approximation to a correct probability of, .01. These ranges of tolerance are 
entirely on one side of the correct probability because, as pointed out above, the 
error in u«ing the x l-cst u con^Wtent in direction. These ranges are purely 
arbitrary, of eour-c, and many may think them too broad. 

On tht ba,SH of ihi.- oi .^ome similar convention it is possible to make objective 
.staiemc.nts conceining ihc range of values of m and n for which the x* test is 
adequate ^ I he next to the la-!t lino in the first section of Table II gives the .10 
valuesjif X ; tlip next to the last line in the second section, the ,02 values. All 
the .Oo ^alm'..’ oi x- Oiovii in the table exceed the .10 value of x'- Using the x 
rest, all ot tlic \ alue, 'aitli two e.vcptions for n = 3) would signify a probability 
gioaier tbaii Oo but, IPS', titan .10 Thus the error made at the .06 level is 
wirhiu the admissible langc u! cording to the suggested convention. The x' 
Test IS llierc-tote an adequate .sulmitutc for the z test at the .05 level for all 
la ucs Oi m and « except possibly for a few of the values for which exact dis- 
rnbtitions are iivailahlo 

As might be expected, the x'* test is levss satisfactory at the .01 level For 

the z test with 

the ^"hic of x^ For m greater than 6, 

butw than 02-f be accorded a probability greater than .01 

of ue of f ^heady noted, this is the range 

^sed [2 P' 695] ' suggested the x“ test could validly L 

In view of the arbitrary nature of the convention as to the permissible error 
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in the probability attached to an observed value of Xr, it is interesting to in¬ 
vestigate the effect of an alternative and stricter convention, namely, that only 
probabilities from .075 to .05 and from .015 to .01 be accepted as approximations 
to correct probabilities of .05 and .01 respectively. The .075 and .015 values of 
X are given in the last lines of the two sections of Table II. On the basis of this 
convention the x** test is adequate at the .06 level for rn greater than three, and 

TABLE III 


Values of S at .05 and .01 Levels of Significance Computed on the Basis of Kendall 
and Smith’s z test, with Continuity Corrections 



n 

Additional values for 

71 = 3 


3 

4 

6 

6 

HI 

m 1 S 


Values at . 05 Level of Significance 


3 



64.4 


157.3 

9, 

64.0 

4 


49.5 

88.4 

143.3 


12 

71.9 

6 


62.6 

112.3 

182.4 

276.2 

14 

83.8 

6 


75.7 

136.1 


335.2 

16 

96.8 

8 

48.1 

101.7 

183.7 


453.1 

18 

107.7 

10 

60.0 

127.8 

231.2 

376.7 




16 

89.8 

192.9 

349.8 


864.9 



20 

119.7 

268.0 

468.5 

764.4 

1158.7 




Values at .01 Level of Significance 


3 



76.6 

122’, 8 

186.6 

9 

76.9 

4* 


61.4 


176.2 

265,0 

12 

103.6 

6 


80.5 

142.8 

229.4 

343.8 

14 

121.9 

6 


99.6 

176.1 

282.4 

422.6 

16 

140.2 

8 

66.8 

137.4 

242.7 

388.3 

679.9 

18 

168.6 

10 

86.1 

175.3 


494.0 

737.0 



16 

131.0 

269.8 

475.2 

758.2 

1129.6 



20 

177.0 

364.2 

641.2 

1022.2 

1621,9 




at the .01 level for m greater than nine, except possibly for a few of the values 
for which exact distributions are available. Thus even so drastic a lowering of 
the permissible margin of error as halving it limits only slightly the range of 
values of m for which the x* test is adequate. 

Table II provides, of course, a direct means of testing the significance of 
observed values of x? for the tabled values of m and n, For this purpose, how¬ 
ever, Table III, giving the significance values of 5 is more useful, since it obviates 
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the necessity of converting S into Xr ■ Fnr w = 3 Table III includes a few 
values of m in addition to those in Table IL 

SUMMARY 

The preceding analysis suggests that the x^ test of the significance of xJ 
(or If or nr), while less accurate than the z test proposed by Kendall and Smith, 
is adequate for practical purposes at the .01 level of significance if the number of 
sets of ranks (m) is greater than 6; and at the .05 level for any number of sets of 
ranks, provided the number of ranks in each set (n) is more than 3. Exact 
distributions are now available for n = 3, m - 3 to 10; r = 4, m = 3 to 6; 
R = 5, m = 3 [i] The .06 and .01 values of x! and S, computed using the 
Kendall and Smith 2 test with continuity corrections, are given in Tables II 
and III of the present note for ?i ~ 3 to 7 and selected values of m from 3 to 100. 
For n greater than 7 and m less than 6, the 2 test with continuity corrections 
should be employed. For all other combinations of n and m not covered by the 
exact distributions or by Tables II and III, the x’’ test is adequate, 
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NOTES 

Ths section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


NOTE ON AN APPROXIMATE FORMULA FOR THE SIGNIFICANCE 

LEVELS OF Z 


By W. G. Cochran 


1. Introduction. An important part has been played in modern statistical 

2 

analysis by the distribution of 2 = ^ log -5, when s? and^ s? are two independent 

estimates of the same variance. In particular, all tests of significance in the 
analysis of variance and in multiple regression problems are based on this 
distribution. Complete tabulation of the frequency distribution of 2 is a heavy 
task, because the distribution is a two-parameter one, the parameters being the 
number of degrees of freedom, % and nj in the estimates sj and si . Thus each 
significance level of 2 requires a separate two-way table. Fisher constructed a 
table of the 5 percent points in 1925 [1], and this has since been extended by 
several workers [2] to the 20,1, and 0.1 percent letel for a somewhat wider range 
of values of ni and • 

With his original table, Fisher gave an approximate formula for the 6 percent 
values of 2, for high values of ni and rh outside the limits of his table. The 
formula reads: 

(1) 2 (6 percent) = - 0.7843 f- - - 

Vh-1 \«i 

u 2 1,1 

where t = — H— ■ 
n ni 712 


The constant 1.6449 is the 5 percent significance level for a single tail of the nor¬ 
mal distribution, and the constant 0 7843 will be found to be ^{2 -h (1.6449)“). 
Thus the general formula for the significance levels of 2 derivable from (1) is 


z 


X 

■\/h — 1 



where a: is a normal deviate with unit standard error. By inserting the appro¬ 
priate significance level of x, this formula has been extended [2] to the tables of 
the 20, 1, and 0.1 percent levels of 2 and commonly appears with all published 
tables of 2. The objects of this note are to indicate the derivation of the 
formula and to suggest an improvement upon it in the latter cases. 
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2. The transformation of the z-distribution to normality. For high values 
of ni and « 2 , the distribution of z approaches the normal distribution, the 
principal deviation being a slight .skewnass introduced by the inequality of m 
and wj. It is therefore natural to seek an approxiniiite formula for the distri¬ 
bution of z by examining its relation to the normal distriliution. For the 
z-distribution the ratio , where k. Is the r‘'' cumulaiit, is of the order 
where n is the smaller of ni. and ih ■ Tins property is common to a large 
number of distributions which tend to normality; for example, the distribution 
of the mean of a sample of size n from any distribution with finite cumulants 
Fisher and Cornish [3] have recently given a method, applicable to all distribu¬ 
tions with this property, for transforming the distribution to a normal distri¬ 
bution to any desired order of approximation. They also obtained explicit 
expressions for the significance levels of the original distribution in terms of the 
significance levels of the normal distribution, discus.siiig the z-distribution as a 
particular example The relation between z and the normal deviate x at the 
same level of probability was found to be 


( 2 ) = 


oYl _ -u R+ -P ■'L+ I ~ 

\n, nJ^^iVl2h 14l V nj 


the three terms on the right hand side being respectively of order n~^, and 
'n~\ so that terms of order are neglected.* 

If this equation is compared with equation (1), the latter appears at first 
sight to be the ap proxim ation of order n“* to the z-distribution, except that the 
divisor of x is y/h — 1 in (1) and ■\/h in (2). Computation of a few values 
shows that at the 5 percent level, equation (1) is tlie bettor approximation. For 
example, for ni = 40, n 2 = 60, (1) gives z (5 percent) = .2334, (2) gives .2309, 
and the exact value is ,2332. 

Since 


Vh- 1 


X X 

2hVh 


-f- terms of order n 


_2 


Fisher's approximation differs from (2) by including a correction term of order 
n ^ Inspection of the true correction terms of this order in equation (2) shows 


that for finite values of Ui and na the term 


X + 11a; 

TiT 


ably smaller than the term 


a;* -b 3 j 

mVh 


Vh(^--Yi 

\ni 712/ 


is consider- 


, since the former has a smaller numerical 


' • 11 

coefficient and involves the difference between — and —. Thus Fisher’s 

ni 712 

fonnula gives a close approximation to the true formula of order 7 i~^, provided 
that ^ IS approximately equal to —1 ^ is approximately equal 

'Fisher and Cornish also gave the two succeeding terms. 
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a: + 3 

to 1. For the 5 percent level, x = 1.6449, and —^— = 0 951 Thus at the 

5 percent level the use oi — 1 in (1) instead of y/h extends the validity of 
Fisher’s approximation from order n"' to order 
This ingenious device, however, requires adjustment at other levels of sig¬ 
nificance. The values of’(a;“ -f- 3)/6 at the principal significance levels are 
shown below. 


Significance level—-% 

40 

30 

20 

10 

5 

1 

0.1 

h = + 3)/6 

0.51 

0.55 

1 

0.62 

0.77 

0.95 

1.40 

2.09 


If y/h _1 in formula (1) is replaced by y/h — X, with the above values of X, 
Fisher’s formula will be approximately valid to order at all levels of signifi¬ 
cance In particular, for the tables already published of the 20, 1 and 0.1 
percent points, X may bo taken as 0.6,1.4 and 2.1 respectively. The values of z 
given by the use of y/h — 1 and -y/h — X aie compared below for = 24, 

ni = 60 .’“ 


Significance Level 

Approximate formula 

Exact value 

i 1 

1 

> 

■y/h — X 

20% 

.1346 

.1337 

.1338 

1% 

.3723 

.3748 

.3746 

0.1% 

.4875 

.4966 

.4956 


The use of y/h — \ gives values practically correct to 4 decimal placesi 
except for the 0.1 level of significance, at which the liigher terms become more 
important 

With the aid of this formula, complete tabulation of the z-distribution for a 
given pair of high values of ni and m is relatively simple. If very low proba¬ 
bilities at the tails arc required, the further approximations given by Fisher and 
Cornish [3] may be used. 
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“The numerical terms in the approximate formula given for the 20 percent points onp.28 
of Fisher and Yates' Slalialtcal Tables are in error. Their formula should read: 


0.8416 

Vx^i 


- 0.4614 


ntj 
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A NOTE ON THE ANALYSIS OF VARIANCE WITH UNEQUAL CLASS 

FREQUENCIES^ 

Bi. Abraham Wald® 

Let us consider p groups of variates and denote by m, {j = 1, ■ • • , p) the 
number of elements in the j~ih group. Let x„ bi'. the f-th element of the j-th 
group. We assume that x„ is the .sum of two variates a, and ??,■, i.e. Xi, = 
ei, + ri,, where <„ (i = 1 , ■ ■ • , m, ;j = 1, - • • , p) is normally distributed with 
mean and variance ir®, and rj, {j = I, • • , p) is normally distributed with 
mean y! and variance a'®, All the variates and ij, arc .supposed to be dis¬ 
tributed independently. 

The intraclass correlation p is given by“ 


. P = 


-f- & 


n’ 


Confidence limits for p have been derived only in case of equal class frequencies, 
i.e. OTi = ms = ■ • = mp. In this paper we .sliall deal with the problem of 
determining the confidence limits for p in the case of unequal class frequencies, 

cr'® 

Since p is a monotonic function of -j-, our problem is solved if we derive confi¬ 


dence limits for — 


Denote by x, the arithmetic mean of the j-th group, i.e. 


( 1 ) 




X, = — 


m, 


+ »?/• 


Hence the variance of x, is equal to 


( 2 ) 


nij 


Denote — by X®. Then we have 


(3) 


= ff I- b X I = —, 

\mf / Wi 


'The author is indebted to Professor H. Hotelling for formulating the problem dealt 
with in this paper, 

'Research under a grant-in-aid from the Carnegie Corporation at New York 
tSee for instance R. A Fisher, Slalistical Methods for Research Workers, 6-th edition, 

p. 228. 
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wheic 

(4) 



Now wc shall prove that 



has the x“-diwtrib\ition with p - 1 degrees of fieedom. 


Let 


y, (j = ... 

and coiihidcr the orthogonal transformation 

y[ = ZiOyi, ... , 1/p), 


1/1-1 == I/p_i(i/i, ... , 1/p), 

v', - Uvu 

VWi + • •. + Wp 

whore Li{yi, ■ • ■ , i/p), • ■ , Lp-iiVi , , i/,,) denote arbitrary homogenous 

linear functions subjcHit to the. only eondition that the transformation should 
be orthogonal. 

Since the mean value of iy, i,s equal to Vw/ (m + m') and the variance of i/, 
is equal to wc obviously have: The mean value of y] (; = 1, , p - 1) 

is equal to zero, the variance of y', (j = 1 , ... , p) is equal to In order to 
prove our statement, we have only to show that the expression ( 5 ) i,s equal to 

\ {y'l + ■ • • + y'v-i)- If wc substitute in ( 6 ) for ;g,, wc get 

a ’VWj 



— ^ iyi + • • • + j/l-i). 
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Since —~ has the x distribution with N — p degrees of freedom, the 


expression 


( 6 ) 


■ p - 1 22(%, - £,)* 


has the analysis of variance distribution with p — 1 and N — p degrees of 
freedom, where JV = mi + • • + , In case mi = Wa = ■ ■ • = = »} 

we have 

(60 .§ ~ « - 1 

p - 1 2S(I., - 1 + mX' 1 + t»X’ 

where i - ??5' and f • - ^ »)’ 

p - 1 SS(a\v - S,Y 

Hence 




V^' /m 


If J^i denotes the lower and the upper confidence limit of F, wo obtain for X“ 
the confidence limits 




Let 118 now consider the general case that wii, • • • , are arbitrary positive 
integers. First we shall show that the set of values of for which (6) lies 
between its confidence limits Fx and F %, is an interval. For this purpose we 
have only to show that 


,i8 monotonically decreasing with X’. In fact 

dm 

c 

Since 


we have 


ling with X’. In fact 

f= t (*■ - w - ^ i- - (*. - II')]- 


d/(X^) _ .^dw, ( 2wi«,Y A j/ ’su),sA“ 
which proves our statement. 
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Hence the lower coiifidcnec limit Xi of is given by the root of the equa- 
tion in A . 



and the upper confidence limit As of X* is given by the root of the equation in x“: 


(8) F = F^. 

Since /(X°) is monotonically decreasing, the equations (7) and (8) have at 
most one root in Xl If the equation (7) or (8) has no root, the coricsponding 
confidence limit has to be put equal to zero. If neither (7) nor (8) lias a root, 
wo have to reject at least one of the hypotheses. 

(1) x„ = + Vi 

(2) The variates and rj, (i =1, , ’ll; li = 1| ■ ■ ■ > p) are normally and 

independently distributed. 

(3) Each of the variates e,, has the same distribution. 

(4) Each of the variates j has the same distribution 

The equations (7) and (8) are complicated algebraic equations in X^ For 
the actual calculation of the roots of these equations, well known approximation 
methods can be applied making use also of the fact that the left members are 
monotonic functions of X^ In applying any approximation method it is very 
useful to start with two limits of the root which do not lie far apart, We shall 
give here a method of finding such limits. 

Denote by P the function which we obtain from F (formula (6)) by substi¬ 
tuting 

(i = i, 

Let / be the function obtained from / by the same process 
Denote by X^) the function which we obtain from F by substituting m 
for Zi, • • , Ij, We shall first show that F is non-decreasing with increasing 
0 F 

h (k = Ij • ■ , p), i.e. -j- > 0. For this purpose wo have only to sliow that 
dik 

^>0. Wo have: 
ok 



Hence our statement is proved. Denote by m! the smallest and by m" the 
greatest of the values mi, • • ■ , . Then we obviously have 
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(9) y‘) < x“). 

Denote by Xi'*, \T, '^ 2 ' tlic roots in x“ of the following equations respectively; 

<p(m', X^) = Ft ; 

X=) - Ft : 

pKx“) = 7'\; v>(m",X“) = 

Since F is monotonically decreasing with incroiising X®, on account of (7), (8), 
and (9) we obviously have 

xi^ < X? < xr 

and 

X^^ < X2 < X's'l 

The above inequalities give us the required limits. 

Columbia Univeiisity, 

New Yohk, N Y. 


THE DISTRIBUTION OF QUADRATIC FORMS IN NON-CENTRAL 
NORMAL RANDOM VARIABLES 

' By William G. Madow* 

The following theorem is the algebraic basis of the theorem of R A. Fisher 
and W. G. Cochran which states necessaiy and sufficient conditions that a set 
of quadratic forms in normally and independently distributed random variables 
should themselves be independently distributed in x’’-distributions.* 

Theobsm I. If the real quadratic forms qi, ■ ■ , qm, in Xi x„ , are 
such that 

( 1 ) 

y y 

and if the rank of q^ is Uy, then a necessary and sufficient condition that 

(2) 

_ _ a 

* The letters 1 , j, n, v will assume all integral values from 1 through n, the letter y will 
assume all integral values from 1 through to, (n ^ w), the letter a will assume all integral 
values from, m + • • + + 1 through Wi + • • • + n.,, (no = 0, Wi + • ■ • + a,, = n'), 

the letters 0, 0 will assume all integral values from 1 through n', and the letters r, s will 
assume all integral values from 1 through n - 1. ' 

The references are, W, G Cochran, “The Distribution of Quadratic Forms in a Normal 
System, with Applications to the Analysis of Covariance,’’ Proc. Oamb. Phil. Soc., Vol, 

(1934), pp, 178-191, and R. A Fisher, "Applications of ‘Student's’ Distribution,’’ 
Melron, Vol 5 (1920), pp 90-104, 
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where the real hnear functions of the x, are defined hy 

(3) x. = 52 

is 

( 4 ) n' = n. 

Furthermore the system of linear forms (3) constitute an orthogonal transformation. 

Proof: Necessity. Since the rank of a sum of quadratic forms is less than 
or equal to the sum of their ranks, it follows that n' > n Upon substituting 
from (3) for the a;’a in (1), and using (2), it is seen that, for all values of the z’a, 

0 0 . 0 ' ” 

and hence, from (1), it follows that 

(5) 52 = 8^^. 

where = 0, if /3 /3', and = 1 if /3 = However, since the rank 

of the system of linear forms (3) is not greater than n, and since the matrix 
oi (5) is the product of the matrix of (3) by its transposed matrix, it follows 
that (5) can be tiue only if n' is not greater than n. Consequently n' — n. 
It then is an immediate result of (5) that the transformation (3) is orthogonal. 

Sufficiency. Wo assume that n' = n. By a real linear transformation of 
Xi, • • , Sn we obtain hnear forms z, .such that 

(It ~ ^aZa , 

a 

where Ca = 1 or — 1. The set of linear functions Zi, • , z„ are linearly inde¬ 

pendent, for if z„ ^ 0, and if real numbers hi, ■ , hn-i not all zero, exist sucli 
that, say, 

Zn ~ ^2 hfZr 

T 

then 

52 2^ = 52 Ht^rZ ,. 

¥ r,a 

Substituting, we have 

12 Qy == = 52 52 Nr,C^^c‘*X,,X, 

7 ¥ r,ii ^^¥ 

where 2„ = 52 , (It is not assumed here that the matrix of the c"" ia the 

inverse of the matrix of the c^,. That fact is a con.sequencc of this proof,) 
Denoting the matrix of Zi, ■ • ■ , Zn_i by On we see that the matrix of 52 <7t is 

7 

C'jlCn where H is the matrix of the Hr, and ha,s rank less than or equal to — 1 
which contradicts the hypothesis. Hence if C is the matrix having the elements 
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c, in its main diagonal and zeros elsewhere and if is the matrix of 2 i, ... ^ 
it follows that ’ ’ 

C'„CCn = I, 

where I is the identity matrix, i.c. the matrix having ones in the main diagonal 
and zeros elsewhere and C„ non-smgular Then C = and hence C is 

the identity matrix and C„ is orthogonal 
Among the hypotheses of the Fishcr-Cochraii theorem is the hypothesis that 
the mean value of is 0, and the variance of a:„ is o-^ However, in connection 
with his analysis of the distribution of the multiple con elation coefficient ^ 
R A Fisher derived the distribution of the sum of the squares of n independently 
distributed random variables xi, ■■ ,x„, the probability density of being 
given by 

® ‘ vM ~ (2ir<r^)~^ exp 1^- . 

More recently, P. C. Tang,^ has used the distribution of the sum of non-central 
squares m his study of the power function of the analysis of variance test 
In this note we extend the Fisher-Cochran theorem to non-central random 
variables, If the random variables are independently distributed with 
probability densities given by (6), Fisher and Tang have sliown that if x'^ = 

”2 ^ 1 then the probability density of is given by 


(7) 


Pix'^) 


¥ 




i¥x'y 

vi r(^ -f v) ’ 


where X = i ^ aj. 

We now give necessary and sufficient conditions that a set of quadratic forms 
in normally and independently distributed random variables should themselves 
be independently distributed in x^^-distributions. 

Theoeem II. Let xi, ,x„ be indeperulenily distributed random variables, 
the random variable x, having probability density (6). Denote Z xl by q, and 

denote a, by Let be quadratic forms, 

such that L and let the rank of be denoted by . 


cienf ” General Sampling Distribution of the Multiple Correlation Coeffl- 

‘ P C tI;! OTS p 121 (1928), pp 664-673. 

Illustrations o^f d of Variance Testa with Tables and 

Uae, .StaHsitcal Research Memoirs^ Vol, 2 (1038), pp. 126-149. 
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i necessary and sufficient condition that the quadratic forms Xy, (x'y = 
he independently distributed with joint probahiliiy density 

(8) ••-.Xm) = IlKx?), 

Y 

where p(xy) given by (7) ■with Uy and \y in place of n and X, and 


(9) 


~ ^ o «a 


.0-= p.K 




is n' = n. 

Proof. Necessity. Tang' has shown that the distribution of x'“ is given 
by (7) and that if the X 7 *_have joint distribution (8), then the distribution of 
Xi d" ■ d” Xmj (~ X )j is (7) with n' in place of n. Upon comparing terms, 
we see that n' = n 

Sufficierwy. By Theorem I there exist n orthogonal linear functions (3) such 
that (2) is true Then it is easy to sec that the random variables Zi, ■. , z„ 
are independently distributed with a joint probability density 

(10) vi^u • • ■ , 2 „) = (2jr<r“)~*" exp [-^ (sv - a()^], 

where 

£ oi'* =i aj, and aj! = 2) . 

tv „ 

If we set 2<r^., == ^ a^, then we have, from (7) and (10), that the Xy are 

independently distributed with joint probability density (8). It i.s only neces¬ 
sary to show that £ Oaf = £ ajiH^a^a, in order to complete the proof of the 

theorem. Now 

£ Qiia, = (£ c,,tCj,') a,a(. 

HiV %,j * 

On the other hand, by direct substitution for the z’s we see that 

37 = £ = £ (£ C„aC,a) Xi^X, 

« a 

and hence aj = 2 c^aCm . Since (1) is an orthogonal transformation, 


Ci,jC/k Ci^ C,'jr “ ^ j ^ai^ai } 

where 5«i = 0 , if a 7 ^ i and - 1 if a = f, wliich completes the proof. 

It is emphasized that the form of X 7 makes it unnecessary to calculate the 
matrix of to determine \y since the values a, need only be substituted for the 
X, in the original expression for gy to determine \y . 

Washinoton, D. C. 


‘ See 4 p 140. 
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TWO PROPERTIES OF SUFFICIENT STATISTICS 


By Louis Olshevsky 


The concept of sufficient statistics was introduced by R A. Fisher in 1922, 
It was refined and extended in 1936 by Neyman and Pearson who gave defini¬ 
tions of shared sufficient statistics and sufficient sets pf algebraically independent 
statistics ^ Today the concept plays an important part in the theory of the 
subject. Characterized briefly, a statistic associated with a .single or specific 
population parameter is sufficient when no other statistic calculated from the 
same sample sheds any additional light on the value of the parameter We 
shall prove that sets of suflficient statistics possess certain interconnections so 
that when one set is known every other set with a like number of members and 
linked with the same population parameters is discoverable. 

Theorem 1. If Ti, ■ ■ , T* are a set of m (m ^ n) algebraically independent 
sufficient statistics with regard to the parameters , • • , and the probability law 
pixi, ■ • , iTn Ml, • • M?) • Ml), ® necessary and sufficient condition for the 
sufficiency of any set of m algebraically independent statistics T[ , ■ , with 
regard to the same parameters and the same probability distribution is that the T[ 
be a set of independent functions of the T, {i, j = m) 

Proof ■ As an adjunct in the demonstration we cite the following theorem 
due to Neyman ^ For a set of algebraically independent statistics Ti, ■ , T„ 

to be a sufficient set with regard to the parameters 6i, • • , 0^, it is necessary 
and sufficient that in any point of sample space, except pGrhap.4 for a set of 
measure zero, it should be possible to present the probability law in the form 
of the product 


(1) ’’ 1 I ■ M« I • ' , ®i) 

“ P(Tl , • , Pbi Ml , • ’ , dg) ■<f,(3;i , • ■ • I Xn i 8g+l , ' ,6;) 

where p{Ti, ■ , T™ Mi, • ■ ■ , is the probability law of Ti, • ■ , Tm and 

the function 0 does not depend upon , • • , dg. 

The sufficiency of the condition stated in the hypothesis of Theorem I is now 


immediately evident. For, if p' and 0' refer to the second set of algebraically 
independent statistics and T, = T,{Ti , • • , where the functions are inde¬ 
pendent, the relations can be solved for the Tj in terms of the giving 

. ,dg) 

= P[UT[, ,TL),-“ ,T„(.T{,. .,P:)Mi, 

1J ••• f 1 m) 

..., 0,) = 0(xx, ..., 6g^r. . ■ ■, 0,) ^ ’ 


See Neyman and Pearson- “Sufficient Statistics and Uniformly Most Powerful Testa 
°Lr Hypotheses,” Statistical Research Memoirs of the Umversily of London, June 

1936 The notatian of the present paper is taken from this article. 

TvT . article m the Giornale dell’ Insztuto Italiano degli Atluan, Vol. VI, 

JNo. 4 (1035) as well as the memoir referred to in footnote 1. 
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and 

, ■ • ■ , Xn\ Si , • ■ ■ , 6 q , ■ ■ , Oi) 

= p'(7'{ , • • , 7’,',, I 01 , , d,j) (/)'(xi , ■ ■ ,Xn ; 05+1 , • ■ • , 0i)- 

Pioof of the nccoHsity is somc'wlmt more involved Since the T, and T[ are 
both sets of algehrnicially indcpendeiii, statistics with regard to 0i, ■ • • , 0,, 
equations (1) and (2) are satisfied They are, in fact, id(;ntities when the 
values of 2’i, ■ • , Tvi and 7'i , ■ , in terms of the Xi are substituted.^ 
Division of (1) by (2) and multiplication leads to the equation 

) ' ' • ) I ^1| • • ~ ) Sq) ^ <f> (xi, • ■ • , Xni 05 + 1 , • • • ) 0l) 

V'iT[, ■■■ ,K\ 6 i, Jq) <^>(21, ,a:n, 05 +i, ,0i) 

The right side of (3) is free of 0i, • • ■ , dq. Therefore, in reality the left side 
inu.st be too. If some or all of the parameters 01 , • ,0, enter formally into 
the left side, we can choose m + I .sets of values 0l, ■ • , 0j (i = 1, ■. , r?i + 1) 
such that each of the m + 1 functions p('A , • • , 1 01, -,01)-^ p'(T'i , 

■" , tL 1 01 , • ■ ■ ) O 5 ) differ.s formally from all of the others. Wc can, then, 
since each is equal to the right .side of (3) which is free of 0i, • ■ • , 0,, equate 
any one of tlicse funcitions to the remaining m in turn This provides m inde¬ 
pendent equations whose veuy (‘xlstence proves that the T[ are functions of the 
T, and vice versa. 

If none of the iiavaineters Oi, • dq enters formally into the loft side of (3), 
})(7’i, ... , 2'« I 01 , • ,dq) must be of the form p(3’i, • ■ , T,„)g{ei , , dq) 

and p'ifi , • , T'm | 0i, • ■ , 9q) of the form p'iT [, •.. , 2 m)p(0i, ■ ■. , 0j). 

In this case the, original probability law p(xi , • • , a:n | 0i, ■ • • , 05 , • ■ , 61 ) 
contains 0i, ■ ■ • , 05 only nominally and there can be no talk of any .statistics 
designed to estimate these paiamctors cither singly or in combination 
When m = 1 and the set of algc'braically independent statistics reduces to 
one, the single statistic is termed a shared sufficient stati,stic of the parameters 
61 , ■■■, 6 q.^ For this special case. Theorem I can be restated as follows If 
T is a shared sufficient statistic with regard to the population parameters 
01 , ■ , 6 q and the probability distribution p(xi, • • , a;„ | 0i, ■ ■ ■ , 0,, ■ ■ ,0;), 
the necessary and sufficient condition for the sufficiency of any statistic T' 
with regard to,the same parameter,s and the same probability distribution is 
that F' be a function of T. When m and q both equal one, the statistic becomes 
a sufficient statistic in the .sense originally defined by Fisher in 1922, 

A physical law is itKlopeiident of tlie coordinate system used to express it. 
This fact is taken account of in modem physics through the employment of 
tensors. One miglit hope for a parallel situation in the relation between suffi¬ 
cient statistics and the probability law to which they refer. Given any I 
parameter family of distribution laws p{xi , ■ • ■ , 2 ,, | 0i, • • • ,0;), the substitu- 


® See the memoir mentioned in footnote 1. 
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tion 8i = diid'i, ... , b'i) (i ~ 1, - ■ ■ , 1) leads to the equally valid representation 
of the family 

p'{xi, ■ ,Xn\6l , ■ ■ ,6i) 

= p[xi , • • •,I ei{e[, , o'l), • ■, di[e [, ■ • •, e!)] 

Is a set of statistics sufficient with respect to the first representation also suffi¬ 

cient with respect to the second? The answer is partly in the affirmative and 
is given by the following proposition. 

Theorem II. If the set of algebraically independent statistics Ti, • ■ • , T„ is 
sufficient with regard to the parameters 8\, ■ ■ , 6g and the probability law 

p{zi, • ■ ■ , Xn\6i, • ■ ■ , dq, ■ ■ , Oi), it is also sufficient with regard to o[, ■ • , d', 

and any other representation p'ixi, ■ ■ ■ ,Xn \ 9i, ■ ■ ■ , , ■ ■ ■ ^Bi) of the same 

probability law provided 9[ {i = 1, • - ,q) are independent functions of Bi, ■ , g, 

only and dj (j = q + I, ■ ■ ,1) are functions of flg+i, ■ • • , Si only. 

Prooe. The proof of the theorem is obvious. We are given the fact that 

p(a!i , ,Xn\ei, ■■■ ,dg, . ,9i) - p{Ti, ■ ■ • , T:n \ Bi, ■ ■ ■ , Bf) , ... , 

Xn , B,+i, ■ , Bi). Since the (i = 1, • ■ ,q) are functions of 0i, ■ • • , 0, 

only and the b', (j = q + 1, ■ ,1) are functions of fljt-i > • ■ ) Bi only, it follows 

that Bi = , • ■ , (f = 1, • • • , <?) and fl; == 0/(0a+i, • • • , 0i) (j - 

g' -h 1, .. ■ , J). Consequently, 

,,, P'(*i. ■ ■ , Xn \ B[ , . ■ ■ , B'a, ■ ■ ^ , e'l) 

~ p'i'^l ) ‘ ' I '^i«\ Bl , • • • , 8g) -ifi (®I , • ■ • , Xn Bg+I f • • > , 0|) 

and the theorem is established. 

New York, N. Y. 


NOTE ON THE MOMENTS OF A BINOMIALLY DISTRIBUTED VARIATE 

By W. D. Evans 

J. A. Joseph, has given two interesting triangular arrangements of numbers, 
the second of which is reproduced herewith as Table 1.* The successive rows 
in this table are the coefficients in the expansion of x" as a function of the fac¬ 
torials using the notation of the calculus of finite differences. For example, 

S' = + 6a:‘® + 7x‘'’ + x, 

where 

x'*’ = x(x - l)(a: - 2) ... (a: - f + 1). 

Joseph points out that the coefficients may be used to generate the numbers 
of Laplace. 


* J. A. Joseph, ' On the Coefficients of the Expansion of Annals of Math. Stnl., 

Vol, X (1939), p. 293. 
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A general expression defining any of the coefficients in terms of its place of 
occurrence in Table 1 may be set up. If we denote by Fc{r) the number in 
row r and column c of the table, we have 

r-o+l A:i *2 kc-i 

(1) ^ oix) = 2 2 fe 2 fcs • ■ ■ fco-l (r c). 

' 1 1 1 1 ~ 


This expression is of additional interest since the numbers defined by it are 
likewise the coefficients in the expression of the s-th moment about the origin 
of a binomially distributed variate in terms of the probability of the variate and 
the size of the sample in which it is contained. For example, it may be easily 


TABLE 1 



1 

2 

3 

4 

5 ■ ■ > c 

1 

1 





2 

1 

1 




3 

1 

3 

1 



4 

1 

6 

7 

1 


5 

1 

10 

25 

15 

1 

r 

Fiir) 

W 


Fiir) 

F,(r) Ec(r) 


verified that if a is such a variate, p its probability of occurrence, and n the size 
of the sample in which it is contained, 

E{oiY = 

E(a)’ = n'^’p^ d" 4- n-p 

E{aY = n^^’p^ + + 7n®p'“ + np 


and so on. 

Ordinarily, computation of the higher moments of a binomially distributed 
variate is a tedious process of repeated differentiation. However, equation (1) 
immediately permits us to generalize the foregoing expressions to give the z-th 
moment of a as follows: 


( 2 ) 


W = 2:n‘-«p-' 

|pn0 


f—t ki 

a ki'£, hi 2 *|' • 

1 1 1 


It will be noted that when c — 1 in equation (1) and i in equation (2) are equal 
to zero, the repeated summations vanish to bo replaced by the value one. 

By means of equation (2) much of the labor usually involved in expressing 
the z-th moment about the origin of a binomially distributed variate in terms 
of n and p may be avoided. 


Wabhinoton, D, C. 



REPORT OF THE ANNUAL MEETING OF THE INSTITUTE 

The fifth annual meeting of the Institute of Matheraatioal Statistics was 
held in Philadelphia, Pennsylvania, on December 27 and 28, 1939, in conjunc¬ 
tion with the meetings of the American Statistical Association, the Econometric 
Society, and the American Sociological Society. The program for the meeting 
was arranged by Professor C. C. Craig. 

On Wednesday morning, December 27, the Institute held a session devoted to 
contributed papers on Statistical Theory and Methodology. Professor P. R. 
Rider, President of the Institute, presided. At that time the following papers 
were presented: 

1. On the unbiased character of certam likelihood-ratio tests when applied to normal 
systems 

Joseph F Daly, The Catholic University of America 

2. The product seminvananis of the -mean and a central moment in samples, 

C, C Craig, University of Michigan. 

3. A method for minimizing the sum. of absolute values of deviaiions 
Robert Singleton, Princeton Local Government Survey. 

4 On certain criteria for testing the homogeneity of k estimates of variance. 

C. Eiaenhart and Frieda S Swed, University of Wisconsin 
B On a test whether two samples are from the same population. 

A Wald and J Wolfowitz, Columbia University and Brooklyn, Now York. 

6. The power,functions of certain tests of significance in harmonic analysts and lag cor¬ 
relation. 

William G Madow, Washington, D. C. 

7 Some iheorelical aspects of the use of transformations in the statistical analysis of rep¬ 
licated experiments 

W G Cochran, Iowa State College. 

8 The standard errors of geometric and harmonic types of index numbers 
Nilan Norris, Hunter College 

9 A study of R A Fisher's z distribution and the related F distribution 
L, A Aroian, Hunter College. 

10 A note on the analysis of variance with unegual class frequencies 
Abraham Wald, Columbia University 

11 An approach to problems involving disproportionate frequencies 
Burton D Seeley, U S. Department of Labor 

Abstracts of these papers are given at the close of this report, 

Immediately following the session just described, the Institute held its annual 
business meeting. At that time President Rider announced that the newly 
elected officers for the year 1940 are: President, S S. Wilks, Princeton Uni¬ 
versity; Vice-Presidents' C, C. Craig, University of Michigan, and A. T. Craig, 
University of I6wa, Secretary-Treasurer. P. R. Rider, Washington University. 
At one o'clock on the same day, members of the Institute and their guests 
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attended the annual luncheon. At the luncheon, Professor B. H, Camp ad¬ 
dressed the Institute on Non-standard Deviations, ■ 

On Wednesday afternoon, the Institute met jointly with the American Statis¬ 
tical Association for a program devoted to Lag Effects in Statistics and Eco¬ 
nomics. Professor J. D. Tamarkin presided and at this time the following 
papers were read; 

1 Lag effects in slalialics and related problems. 

A, J. Lotka, Metropolitan Life Insurance Company 

2 Some methods in the analysis of lag effects. 

H. T Davis, Northwestern University 

3. Lag effects in economics 

Charles F. Eooa, Institute of Applied Econometrics, Inc 

A joint session with the Biometric Section of the American Statistical Associa¬ 
tion was held on Wednesday evening, Professor George W. Snedccor presiding. 
The papers presented at this session, which dealt with Design and Analysis of 
Replicated Experiments, were the following' 

1. Practical difficulties met in the use of experimental designs. 

A E. Brandt, Soil Conservation Service 

2 Factorial design and covariance in thehiological assay of vitaminD. 

C I. Bliss, Sandusky, Ohio 

3 Combinatorial problems in the design of experiments 

Gertude M Cox, Iowa State College, 

4 Experimental trials with balanced incomplete blocks, 

W J. Youden, Boyce Thompson Institute. 

On Thursday afternoon the Institute held consecutively joint sessions with 
the American Sociological Society and the Econometric Society. At the first of 
these. Professor William F. Ogburn presided and the following program was 
presented: 

1. How the mathematician can help the sociologist. 

Samuel A. Stouffer, University of Chicago 

2. Some problems of combinations and permutations as they apply to a comprehensive 

classification of social groups 

George A. Lundberg, Bennington College 

Discussion: C C. Craig, University of Michigan. 

Philip M Houser, U S. Bureau of the Census. 

At the second session the topic for discussion was Recent Advances in Business 
Cycle Analysis and these papers were given: 

1 Recursive methods in business cycle analysis. 

Merrill M Flood, Princeton Surveys. 

2 An appreciation of some recent mathematical business cycle theories. 

Gerhard Tintner, Iowa State College 

3. The statisticians' new clothiers 

Arne Fisher, Western Union Telegraph Company, 


Paul R. Rider, Secretary. 



ABSTRACTS OF PAPERS 

(Presented on December 27,1939, at the Philadelphia meeting of the Institute) 

On the Unbiased Character of Certain Likelihood-Ratio Tests when Applied to 
Normal Systems. Joseph F. Daly, The Catholic University of America. 

Consider a random sample of N observations on a sot of variates a;', • •• , where 
x\ , a;* are assumed to be normally distributed about means which are linear functions 
m* * 2 liia:."ofthefixedvBriatesa:*+h , a:«. One is sometimes required to decide whether 
the sample tends to contradict the further hypothesis, if o, that the coefficients bj belonging 
to a certain subset of the fixed variates, say a;*+h • ■ • , have the specific values , 
Such a situation occurs, for example, m the generalized analysis of variance. In this paper 
it is shown that the Neyman-Pearaon method of the ratio of likelihoods yields a test of Ha 
which is (at least locally) unbiased, in other words, this test is less likely to rejectifo when 
the sample is in fact drawn from a normal population in which bj = bjj than when it is drawn 
from a normal population m which the b' are different from but sufficiently close to bjj 
In the special oases A = 1 or A = 1 the proof goes through even without the restriction that 
the true bj be close to bjo, a result which is also implicit in the papers by P 0. Tang and 
P. L. Hsu (Siat. Ees Mem. Vol. 2). 

Similarly with respect to the hypothesis Hi that the deviations it' — 2b'i' fall into 
certain mutually independent sets the X-teat is at least locally unbiased; and it has the 
additional property that the expected value of any poai tive integral power of v X is greater 
when R; is true than when the sample is drawn from any other normal population, 

The Product Seminvariants of the Mean and a Central Moment in Samples. 
C. C. Ceaig, The University of Michigan. 

The method used by the author in calculating the product seminvariants of a pair of 
central moments in samples is not adapted without modification to the present problem. 
In the present paper the necessary modification is developed which gives a routine method 
for the calculation of these sampling distribution characteristics. The calculation is a 
little hefivier than in the previous case but the results for the mean and the second, third, 
and fourth central moments are given up to the fourth order except in one case in which the 
weight IB 13. It is planned to follow this with a further study of the distribution of Fisher's 
t in samples from a normal population. 

A Method for Minimizing the Sum of Absolute Values of Deviations. Robert 
Singleton, Princeton Lpcal Government Survey. 

E C Rhodes [Philosophical Magazine, May 1930) presented a method for the estimation 
of parameters m a linear regression where it is desired to minimize the sum of absolute 
values of the deviations. In this paper the structure of the deviation surface is analyzed 
and a method of steepest descent is developed which for computational purposes is an 
improvement over Rhodes’ method, The process is finite and leads to an exact solution. 
The method and the formulae used are such as to permit the successive additions of new 
observations or sets of observations to the original data, or the exclusion of an observation 
from the original set, and the determination of the parameters for the sets of data so de¬ 
rived, with little additional labor. 
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On Certain Criteria for Testing the Homogeneity of h Estimates of Variance. 

C. Eisenhakt and Fkieda S. Swed, University of V^'isconsin. 

Given k variance estimates sf , s, , • • , s* with n,sj, (r = 1, 2, ■ • • , fc), independently 
distributed as xVJ for tir degrees of freedom, testa of the hypothesis, /fo, that crj = tr®, 
(r = 1, 2, • , k), where is unknown, have been based to date on one or the other of the 

quantities 

jb 

Qi “ ^ nAi\ - 8»)V2«* 

k 

Qi = w log {na‘‘/v)) —^ nirlog 


where the Wr are weights, w 





A. E. Brandt and 


W. L Stevens have advocated the use of Qi , referring an observed value of Qi to the x* 
distribution for A; — 1 degrees of freedom. J Neyman, E. S. Pearson, B L. Welch, and 
M. S. Bartlett have advocated tests based on Qt , Bartlett definitely proposing the use of 
degrees of freedom as weights, i e Wr = Wr i and recent work of E. J G Pitman and others 
has shown that unless vjr = nr testa based on Qj are biasedf. (Astatistical test of an hypoth¬ 
esis H IS said to be unbiased when the probability of rejecting H by its use is a minimum 
when H is true, obviously a desirable property.) When Wr = nr Bartlett has suggested that 


the distribution of Qj can be satisfactorily approximated by referring Qi/{l -f- 


3(fc - 1) 

to the X* distribution for A; — 1 degrees of freedom. In this paper we discuss 


the adequacy of the x’ distribution to describe the distribution of Qi and of the adjusted 
Qi when the degrees of freedom, nr, are small 

U. S. Nair and D. J. Bishop have given theoretical evidence which suggests that when 
Tir > 2, (r = 1, 2, • • > , k), Bartlett's ad,iu8ted Qi may be expected to conform to the x’ 
distribution reasonably well in the neighborhood of the 6% and 1% levels. Using 1000 
samples of 4 for which nrsj/(”r+i) been tabulated by W. A. Shewhart in Table D, Ap¬ 
pendix II of his “Economic Control of Quality of Manufactured Product," 200 values of 
Qi and Qt (with adiuatment)* were calculated and compared with the x’ distribution for 
k — I degrees of freedom. Two cases were studied; Case I, A: => 5 and m =• ns = • ■ • =3; 
Case II, fc = 3 and ni = nj = 3 while nj = 9. As measured by the Chi-Square Goodness of 
Fit Test, using 11 degrees of freedom, the fits were good in all four instances. In Case I, 
forBartlett’sadjustedQsthe test led to .80 < P < .00, and to .70 < P < .80 for the Brandt- 
Stevens Qi ; in Case II, the fits were poorer with 50 < P < 70 for Bartlett’s criterion and 
10 < P < .20 for the Brandt-Stevens However, an examination of the descending cumula¬ 
tive distributions showed that in all instances these criteria exhibited a deficiency of large 
values of x*, with the deficiency, in general, more marked in the case of the Brandt-StevenS 
test. Consequently, when one uses significance levels for those criteria obtained by means 
of the x’ approximation advocated, one is in reality using a level of significance slightly 
less than that professed. The disor.ipanoy is not great, however, and is on the safe side, i.e. 
one will rej eot Ho falsely in the long run less often than one professes to be doing. Without 
doubt, however, one will also detect the falsehood of ffo when irj js , for at least one pair 
of values of r and less often in the long run by the use of these approximate signifi¬ 

cance levels than if the true levels were used, but we have no definite evidence at present 
on this point A somewhat disquieting feature is that the agreement between the x* values 
yielded by the two criteria becomes worse as one proceeds toward larger values of x* in 
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terms of either quantity Thus, of 8 samples which Qi would have rejected at the 5% level 
in Case I, only 4 of these would have been rejected by Qi , and Q2 would have passed 3 
samples of the 7 rejected by Qi . Thus it appears that, if one wishes to work with a given 
chance of rejecting TTo falsely, one i loiild choose one of these criteria and then stick to it m 
future applications For large values of the n, the two eriteria tend to eciuivalonce, so the 
choice between them is of interest mainly for small iIt , but cannot bo made with full in¬ 
formation until more is known about the bias, if any, of the Brandt-iStevonfl test, and the 
relative power of the two tests with regard to alteiiiativcs to II 0 


On a Test Whether Two Samples are from the Same Population. A. Wald 
AND J. WoLFOwiTZ, Columbia University and Brooklyn, New York. 

Let H and F be two independent random variables about whose distributions nothing is 
known except that they are continuous. Let an , 12 , • • • , Xm bo a sot of in independent 
observations on X and let y\, yi, • , 2 /,, be a set of n independent observations on Y, 
The null hypothesis to be tested is that the distributions of A' and Y are identical 

Let the set of w + n observations bo arranged in order of magnitude, thus. , 22 , • ■ ■ , 
Sm+n. Replace by w. (i = 1, 2, ,m + n) where w, = 0 if 2 , is a member of the set of 
x’s and «, = 1, if 2 i is a member of the set of y’n. Since the null hypo thesis states only that 
the distributions of X and Y are identical without specifying them in any other way, the 
distribution of the statistic V used for testing the null hypothesis must bo independent of 
this common distribution of X and Y It can easily be shown tliat the statistic V must be 
a function only of the sequence ,V 2 , , v„^„. 

A subsequence v ,, s,.^i , ■ , v,+, (where r may also be 0) is called a run if v, - e,+i = 

•' = Sj+r and if Vi-i v, when s < 1 and if v,i., 9 ^ a»+r+i when s + r < m + n. The 
statistio U defined as the number of runs in the sequence i>i ,Vi, • • , i»„ seems a suitable 
statistic for testing the null hypothosis, A difference in tho distribution functions of X 
and Y tends to decrease U , Hence the critical region is defiimd by the inequality !7< Uo, 
where Uo depends only on m, n, and the level of significance adopted. If ?n < n and 
/’ll/ = 0 ) is'the probability that U - c, then: 


F117-2K) 






P\V = 2fi: - 11 = 

The mean of V la. 

2mn 

jT" 'I' 
m +11 

The variance of t/is- 


(A = 1,2, ■■,»!), 


(A = 2, 3, ,m + 1), 


' 2mn{2mn — m — n) 

(pi -I- n)*(m + 71 — 1 )' 
m 

^ a (a positive constant) and ni «>, the distribution of 17 converges to the normal 
distribution 


The Distnbution of Quadratic Forms In Non-Central Normal Random Vari- 

a es. William a Madow, Washington, D. C. (Presented to the Institute 
under a slightly difierent title) 
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Let the digtributlon of a sum of non-central squares of normally and independently dis¬ 
tributed random variables which have the unit variances be called the distiibution 
It IS proved that if a sot of quadratie forms have a sum which is the sum of the squares of 
their variables, tlien a necessary and sulficicnt condition that the quadratic forms be inde¬ 
pendently distributed in x''^ distributions is that the rank of tlie sum of quadratic forms be 
equal to the sum of the i auks of the quadratic forms Fui thermore, the constants on which 
the x'“ diatiibutioiis depend may be obtained by substituLinp; the values about which the 
variables are taken for the variables tliomsolvcs in the quadratic forms llouglily speaking 
the theorem states that if a set of quadratic forms satisfy the conditions of the Piaher- 
Coohran theorem when the true moans vanish, then the set of quadratic forms will be 
independently distributed in x'* distributions when the true means do not vanish. 


Some Theoretical Aspects of the Use of Transformations in the Statistical 
Analysis of Replicated Experiments. W. G Cochran, Iowa State College. 

The device of transforming the data to a different scale before performing an analysis of 
variance has recently been recommended by a number of writoi s for replicated experiments 
in which the original data show a markedly skew distiibution. The use of transformations 
to obtain an approximate analysis has been supported mainly on. the grounds that in the 
transformed scale the true experimental erroi variance is approximately the same on all 
plots This paper considers the relation of the method of transformations to a more exact 
analysis Disoussion is confined to the ■\/x and Bin“i \/x transformations, which appear 
to receive the most frequent use in practice. 

To obtain an exact analysis, it is necessary to specify (i) how the expected value on any 
plot vs obtained from unknown parameters representing the treatment and block (or row 
and column) effects (li) how the observed values on the plots vary about the expected 
values. If the latter variation follows the Poisson law, (a case to which the square root 
transformation has been conaidcrod appropriate), the equations of estimation by maximum 
likelihood take the form 



where x is the observed and m the expected value on any plot, c is a typical unknown para¬ 
meter, and the summation extends over all plots whose expectations involve c As the 
number of parameters is usually largo (e g. 16 in a 6 x 6 Latin square), these equations are 
laborious to solve; moreover, the question of obtaining Bmall-samplo tests of significance is 
difficult It IS shown that if a particular form can be assumed for the prediction formula 
in (i), namely that ^/m is a linear function of the treatment and block (or row and column) 
constants, the equations of estimation may be reduced to the simpler form 


( 2 ) 


S 4(r' — ■\/m) = 0, 


where r' = - ( ^ J is a function closely related to the square root of a, It follows 

2 \ \'m/ 

that the statistical analysis in square roots, with some slight adjustments, Ooincideswith 
the maximum likelihood solution, provided that the above form can be assumed for the 
prediction formula. The appropriateness of this form in practice is briefly considered and a 
“goodness of fit” test by x’’ is developed Anumerical example is worked as an illustration 
and indicates that a good approximation is obtained by the transformation alone even 
with very smallnumbera perplot The corresponding theory is also discussed for the inverse 
sine transformation, which applies where the original data are percentages or fractions 
whose experimental errors are derived from the binomial distribution 
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In pJACtioe the type of analysis outlined above is unlikely to supplant the simple use of 
transformations, because it can seldom be assumed that the experimental variance is 
entirely of the Poisson or binomial type. The more exact analysis may, however, be 
useful (i) for oases in which the plot yields are very small integers or the ratios of very 
small integers (m) m showing how to give proper weight to an occasional zero plot yield. 

The Standard Errors of Geometric and Harmonic Types of Index Numbers. 
By Nilan Norbis, Hunter College. 

Various statisticians have made empirical studies of the sampling errors of certain types 
of index numbers used in the United States and England. None of these writers has taken 
advantage of the tools afforded by the modern theory of estimation, including fiducial 
inference, as a means of arriving at direct and general expressions for estimating the stand¬ 
ard deviations of the sampling errors of geometric and harmonic types of index numbers. 

A known expression for the first approximation to the variance of a function, as given by 
the relation between the variance of the function and the variance of the argument, is 
valid for that general class of distributions of which the variance and a higher moment 
are finite. With the aid of this relation, there appear simple and useful forms for estimat¬ 
ing the standard errors of geometric and harmonic types of indexes. For sufficiently large 
samples, these forms are valid for all of the types of distributions of price relatives, produc¬ 
tion relatives, and similar observations ordinarily encountered, provided that there are 
satisfied the necessary conditions for drawing sound inferences on the basis of sampling 
without reference to the value of the variate. 

Necessary oonditiona for using testa of aignifioance soundly in connection with index 
number problems are those of realistic and intimate acquaintance with observations, and 
careful attention to certain broad theoretical considerations which determine whether or 
not the index is suited foe the purpose for which it is used. 

A Study of R. A. Fisher’s z Distribution and the Related F Distribution. L. A. 
Aroian, Hunter College. 

The following results for the z distributiqn and related F distribution are investigated: 

(1) Geometric properties. 

(2) Exact values of the aeminvariants and mdments of z. Exact values of the first 
four central momenta of F. 

(3) The approach to normality of both distributions as Wi and tit become large in any 
manner whatever, 

(4) The Pearson types of approximating curves, the logarithmic normal approximation, 
the Gram-Charlier approximation, and the uses of these in finding any level of 
significance of z and of F, 

A Note on the Analysis of Variance with Unequal Class Frequencies. Abraham 
Wald, Columbia University. 

Let us consider p groups of variates and dendte by m/ (j >“ 1| " • , P) tbe number of 
elements in the j'-th group. Let Xu be the t-th element in the j'-th group. |We assume that 
xtj is the sum of two variates tn and tin i.e. Xn “ «</ + ij/ where (t "» 1, < ‘ , mij ” 
1, ‘ • , p) is normally distributed with mean p and variance <r*, and p; (y “ 1, ■ ■ ' < p) i® 
normally distributed with mean ii' and vatianoeo-'’. All the variates tii and ij j are supposed 
to be distributed independently. The intra-olass correlation p is given by 


p 
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Confidence limits for p have been derived only in case of equal class frequencies, i,e, Wi = 
= .. ■ = Wp We give here the confidence limits for p in case of unequal class frequen- 

cies. Since p is a monotonic function of —, it is sufficient to derive confidence limits for 

r 

g'i ir'* 

-, Denote “ by and the arithmetic mean of the i-th group by S/, Let 


Wj 


mj 

1 *f W/X*' 


and denote by Fi and Fi the lower and upper confidence limits respectively of F, where F 
has the analysis of variance distribution with p - 1 and W - p « mi + •" + m^ - p 
degrees of freedom, Then the lower confidence limit Xj of X’ is given by the root of the equa¬ 
tion in X’: 



and the upper confidence limit Xj of X‘ is given by the root of 

(2) m - A. 

For calculating the roots of (1) and (2), wo can make use of the fact that/(X’) is mono- 
tonically decreasing wi th increasing Xh 


An Approach to Problems Involving Disproportionate Frequencies. Burton 
D. Seeley, Washington, D. 0, 

Applied mechanics offers an analysis of variance solution to problems of multiple classi¬ 
fication involving disproportionate sub-class numbers, The quality of orthogonality may 
be attained in such problems by inoasuring the variability between classes of any one 
classification after centering the others, This approach, which is not limited by the num¬ 
ber of classes or the number of classifications, treats the problem involving equal sub-class 
numbers as a special phase of the general analysis of variance 



CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Puepose 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2 . Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membehship 

1 . The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 

ARTICLE III 

Oppicbrs, Boahd of Dibbctohb, Committee on Membership, and Committee on 

Publications 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer, elected for a term of one year by a majority ballot at the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936 

. 2. The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3, The Institute shall have a Committee on Membership composed of three FeUows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 

ARTICLE IV 

' Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of othfer business of the Institute shall be held annually at such 
tune as the Board of Directors may designate. Additional meetings may be called from 

IW 
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time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of tlie Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the ques¬ 
tion at the next meeting. 

4 At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum, At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Matliematical_Slatislics shall be the Official Journal for tlio Institute- 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1 This constitution may be amended by nn affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 

BY-LAWS 

ARTICLE I 

Duties of the Officers, Board of Directors, Committee on Membership, and 

Committee on Publications 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 
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shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding oflicer shall vote only in the case of a tie, but at meetings 
of the Board of Directors ho may vote in all cases. At lea.st three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting Additional nomina¬ 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre¬ 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Insti¬ 
tute after it has been audited by a Member or FeUow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

6, The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Insti¬ 
tute, and of all books, pamphlets, manusonpta and other literary or scientifio material 
collected by the Institute. Once a year this Committee shall cause to be printed in the 
Official Journal the Constitution and By-Laws and a classified list of all the Members 
and Fellows of the Institute. 


AHTICLE II 
Ddbb 

1 Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Mertibers shall be fifty dollars. Honorary Members shall be exempt 
from aU dues, 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
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may be six months in arrears, and to accompany such notice by a copy of this Article 
If such person fad to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Board of Directors, 
by whom the person’s name may be stricken from the rolls and all privileges of member¬ 
ship withdrawn. Such person may, however, be re-mstated by the Board of Directors 
upon payment of the arrears of dues. 

ARTICLE III 
Salahibs 

1. The Institute shall not pay a salary to any OfRcer, Director, or member of any 
committee. 

ARTICLE IV 

Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or Ijy a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 
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Baker, Dr. G. A. Experiment Station, College of Agriculture, Univ, of Calif, Davis, 
Cahf., 

Barral-Souto, Dr. Jose. Cordoba 1469, Buenos Aires, Argentina. 

Barrett, Mr. C. S. 3145 Maple Ave., Brookfield, Ill. 
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Carver, Prof, H. C. Dept of Math , Univ. of Michigan, Ann Arbor, Mich., 

Chapman, Mr. Roy A. Forest Service, U S Dept, of Agriculture, 1000 Masonic Temple, 
New Orleans, La. 
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DaJy, Dr. Joseph F. 719 Irving St, N E., Washington, D. C. 

Dantzlg, Mr. George B. 2609 Pulton St., Berkeley, Calif 

DeLury, Dr. D. B. Dept of Math., Univ. of Toronto, Toronto 5, Canada. 

Dealing, Dr. W. E. Bureau of the Census, U S. Dept of Commerce, Washington, D. C . 

Dodd, Prof. E. L. Dept, of Math,, Univ of Texas, Austin, Texas 

Dodge, Mr. Harold F. Bell Telephone Lab , 463 West St., New York, N Y. 

Doob, Dr. J. L. Dept, of Math., Univ of Illinois, Urbana, Ill , 

Dressel,Dr. PaulL. 6126 S. Woodlawn Ave, Chicago, Ill. 
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2. The Laplace-Liapoimoff theorem.'* We shall first state some definitions 
and terminology which will be used throughout the paper. 

If used as subscripts or super,scripts, or as itjdices of summation or multiplica¬ 
tion, the letters t, j will take on all integral values from 1 through p, the letters 
p, V will take on all integral values from 1 through 7i, the letters y, d will take on 
all integral values from 1 through m, the letter a will take on all integral values 
from 1 through k, and the letter /3 will take on all integral values from 1 through 
fc-1, unless explicit statement to the contrary is made. 

The totality of all sets of v real numbers will be denoted by R\ Thus R" is 
the combinatoi'y product of the spaces R\ R\ ■ ■ ■ , R\ (v times). 

If Ki, ■ ■ ■ , a:n are random variables, and if A is a proposition concerning 
xi, •• then by P|A} we shall mean “the probability that A.” The 
distribution function of the random variables xi, ■ ■ • , will be denoted by 
F(xi , ■ ■. , Xn), i.e, 

F(zi , ■ ■ • ,xl) = P{Xi < Xi , • • ■ , Xn < zll 

for all seta of n real numbers. Thus F will have an operational meaning in 
this paper. 

If A(xi, • • I Xn) is a function of xi, • • • , Xn defined on R" and measurable' 
with respect to F(xi , ■ ■ • , x„), then JB{A(xi, • • , x^)} will bo defined by the 
equation, 

B{A(xi, ... , x„)) = / A(xi, ,x„)d!F(xi, , x„), 

j/iif 

where the integral is a Lebesgue-Stieltjes or Radon integral. Hence 
1 A(xi, • ■ , x„) I is assumed to be integrable with respect to F{xi , • > • , x„). 

If n(i/i , ■ ■ ■ , Vp) is a single valued measurable function of yi, ■ • < , j/p on 
B’’, and if y, is a real single valued Borel measurable' function of xi, • • • , x, 
on R", then upon substituting for yx, ■ ■ , y, it is seen that n(yx, • • ■ , 2/p) 


* Although the theorems will be stated in terms of probability distributions, Borel 
measurability, and Lebesgue-Stieltjes integrability, it may simplify the reading if the 
words “probability distributions” are replaced by probability densities or statistical 
distributions, "Borel measurability” are replaced by continuity, and “Lebesgue-Stieltjes 
integrability” are replaced by Riemann integrability. 

‘ Afunction A(xi , ... , (t„) defined on B” is said to be measurable with rospoot to a distri¬ 
bution function B (a:,, ... , a:„) if the set E{1) of all Xi, . . , Xn such that A(xx , Xn) < I 

is such that f dP(xi, ,., , x„) is defined for all 1. 

Jj(0 

• All subsets of E" which may be formed from the totality of intervals of B" by repeated 
summations or multiplications of not more than a denumerable number of intervals of 
B", and fl" itself, constitute the totality of Borel sets of S'*. The function y(xi, .. , Xn), 
defined on R'*, is a Borel measurable function of x,, ... , Xn on B" if the set of values of 
Xi, , , X, such that yfi,, . , i») < (is a Borel set for all (. The class of continuous 

functions is contained in the class of Borel measurable functions For further details, 
see [3, ohs, 1, 2], [11, ch 3] and [17, chs. 1, 2, 3]. 
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is a single-valued measurable function, A(a;i, • , a:„) of aii, • , a:„ on J?" . 

x\, • • > , Xn are random variables, then yi, • ■ ■ , yp &v& random variables, 
and^ 

(2.1) E{U{yi,-- ,yp)] = E{A{xi, ... ,x„)]. 

We shall call jG(xi) the mean value of x,, o-.^ the covariance of x, and x;, 
and o-<, or o-J the variance of x,, where <r„ = E{(xt — i/x,)(x, — Ex,)]. 

The Laplace-Liapounoff, or Central Limit theorem states conditions under 
which linear functions of random variables have a normal limiting distribution. 
The general characteristic of the proofs of the theorem is that conditions are 
placed on the random variables so that they may virtually be assumed to be 
bounded. The Lindeberg® condition, which we shall use, is perhaps the least 
restrictive of all the conditions which require finite means and variances. 

The Lindeberg condition*, £p '. A set of random variables x,'^„ will be said to 
satisfy the Lindeberg condition Sp if there exists, for any preassigned positive 
real numbers S and «, a positive integer no such that if n > no, then 

/ 2yn dE(^Xlpp, • * • , Xpyp) 5, 

where 

Zyn ^ ^2yn "1“ ' ’ "I” Xpyn 
and 

fflln + <riSi. + • • • + lUnn = L 

If 

x,yn = — where 5*„ = (r« -f- ■ ■ ■ + , 

5in 

and the x„„ satisfy £p then we shall say that the x,y satisfy £p . 

Suppose that the random variables j/u, • > • , ypn, have a normal multivariate 
distribution with zero means and with covariance parameters where 

(Tiyjs = EiVt^Vii), y = 1, • • • , m, ; S = 1, ■ ■ ■ , m;, 

and denote the distribution function of yu, • • • , ypmp by N{y). Then we may 
state the Laplace-Liapounoff theorem as: 

’ It is noted that a(yi, ,, , j/p) is integrated with reepect to F(yi . yp) and 

A(a;i, ,,, , x„) 18 integrated with respect to P[xi , .. , x„) 

* See Cramer [3, pp. 57, 60, 114], and the references there given. 

* It IS not difficult to show that the Lindeberg condition will be satisfied if moments of 
order greater than two exist, [3, p. 60], or if the conditions stated by Levy [13, p. 207] 
and [14, p. 106] are satisfied 
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Theohem I. Suppose that, for each value of n, the random variables , 
which are independent for different values of v, have zero means and covariance 
parameters , where 


(Tiyjivn — ^(p^iyvn^jSvr^- 


Denote by d'n the maximum of the variances a-iyty,„. If the functions piyn are 
defined by the equations 

y%yn ~ Xiypn} 

V 


it follows that 

~ O'lYjivn* 

If lim ffiyjin — fTtyis and if Inn d'„ = 0, then a necessary and sufficient condition 

ft-+QO n-^00 

that as n-^ the limiting distribution^'^ of yim , • • • , Vpmpn be N(y) is that the 
condition be satisfied. 

The proof of this theorem is omitted. It may readily be developed from the 
proofs of Cramer, [3, pp. 57, 113] 

Before stating certain corollaries which are of interest, some additional 
definitions are necessary. 

Let Cn, C„+i , ■ • • be a sequence of m rowed real matrices 

Cn - II Cypn II, n = m, m + 1, • ■ ■ , 

and let the greatest of the absolute values of the elements of C„ be denoted by 
dn. The inner product of any two rows of C„ will be denoted by pyi „, i.e. 

Pyln ~ Cyan Cspn • 


Let Xi, Xa, • ■ be a sequence of random vectors of p components defined 
on Df, and let the components of be denoted by xi^ , • • , Xpp . Let the 
components of the chance matrix y„ = 11 y,y„ 11 which ha.s p rows and m columns, 
be defined by the equations 

( 2 . 2 ) ypyn — Cyfn^,p 

V 

for each value oi n, {n = m, ■ • ',m> p). 


The distribution functions F{Xn) will be said to converge to the distribution function 
F(X) if and only if 


lim r dF(X„) = F(X) 
n-® 

for every X at which F(X) is continuous It F(X) is continuous throughout K", then the 
convergence is uniform. 
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Suppose that 

(2.3) E(x») = 0 

and 


(2.4) 


E(XipXji^ — tTtjSitf , 


where 5;,v = 1 if = v and S^i, = 0 if /i v. (There should be no confusion of 
this use of the letter S with its use as an index.) It is easy to see that if the 
Cy,„ are real numbers, then 

E(y.y„) = 0 


and 


0‘*jPY5n • 


Let the determinant of the positive definite symmetric matrix, {<t) = 1| o-,, j] 

be denoted by cr. Let the inverse matrix of (tr) be denoted by (o-)”^ = |1 o-'^ j| 

where is the cofactor of ci,' in (<r) divided by <t. The determinant of (<r)~* 
. -1 
IS a- 

By Ndixi , • • • jXp] {a)) we shall mean the normal probability density with 
zero means and covariance parameters o-,,, i.e., 

Nd{xi, . • •, Xp ; (<r)) = (2ircr)“* exp [—| 2 cr'’xiX,], (- » < a;< < «>), 

».) 


where (cr) is a positive definite matrix. If the random variables xi, • • • , x, 
have probability density Nd{X ; («■)) = Ndixi, • • •, Xp; (o-)), where X is a vector, 
then we shall say that X has a distribution function N{X; (o-)), i.e. 


a" 


axi < • ■ dxp 


N{X) (cr)) = NdiX; (a)) 


or 


/ Ip p*i 

• ■ ■ / Nd{ti, , 

00 «L-oo 


Ip; (ff)) dii • • • dtp = N{X; (o-)). 


Inasmuch as certain hypotheses will be used on several occasions in this 
paper, they are stated here 

If Xi, X 2 , ■ • ■ are independently distributed, if (2.3) and (2.4) hold and if 
the x’s satisfy the condition Sp then we shall say that DCj is true. 

If Cn is such that, for all n, the equations are true, we shall say 

that G is true. 

The following corollary is useful in deriving limiting distributions in the 
analysis of variance. 

Corrollahy I. Let 'DCp.and C be true. Then a sufficient condition that 
lim F(y„) = n Niyiy, • • •, j/py; (<r)) 

y 

is lim d„ = 0. 

n —>00 
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The proof is based on the fact that the Xty,n of Theorem I are given by CynXiy. 
The details are omitted. 

The pm rowed square matrix, (t) = l(T„a H is defined follows: If r < m, 
s < m; then t„ = (mpr.; and if /cm < r < {k + l)m, Im < s < (/ + l)m, 
I, k — 0, ■ ■ ■ , p — 1, then = Vk+i i+iPr-tm <-zn . The inverse matrix of 

(t), and the determinants of (t) and (t)“^ are defined as are a and 
COEOLLAHY II. Let DCp be true, and let 

lim Pyln — PyS) Pyy ~ !• 

tt^oo 

Then, if lim dn ~ 0, it follows that 

n-^oo 

lim F(Y„) = FiY), 

where F(Y) is the distribution function determined by the probability density 

pm r j>m 

(2t) ^ T exp I T yk-\-l r—krn 2/i+l a—tm 

L ^ >-,.-1 

where, if r < m, s < m, then k = 0,1 = 0\if r < m, m < s < 2m, then k = 0, 
Z = 1; and so on. 

The proof is omitted. 

If Zi, ■ • • , are random variables, then F{Xi , • Xk \ Zi Zt) is 
the distribution function of the, random vectors Xi, • • ,Xk for fixed values of 
Zi, ■. ,Zt, i.e. for any fixed values of Zi, • • • , Zj, 

P{Xi <Xx,... ,Xk<X,] = F{X,, . ,X,\Zi,... , Zi). 

We shall now assume that the elements Cy,„ of the matrix Cn are Borel measur¬ 
able functions of a set of random variables” Zi, ■ ,Zt^. Then the matrix 

C„ may be called a random matrix defined on a space Wn which is the combina¬ 
tory product of the spaces on which Zi, • , Zi^ are defined. If, for each value 

of n, and for all X" and Z", the equation 

(2.6) F{X\ Z") = F(Z"). n F{X, \ ZT) 

is satisfied, then we shall say that A is true. It is obvious that suflheient condi¬ 
tions for the truth of d are 

F{X\ Z") = W).n W) 

¥ 

or, if <„ > w 

F{X% Z") = F{Zy,^i , ... ZO • n F{Xa, Z,) 

The symbol X" will stand for the set of variables Xi , 
will stand for the set of variables Zi, . , , Zi„ . 


. , Xn I and the symbol Z 
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or, if in < ra 

FiX^Z") = tLF{X„Z,). n Fix,). 

,-l ■'“(n+l 

Inasmuch as we shall often use Fubini’s theorem, it is now stated here.^'* 
Theorem II. Let the distribution function of X", Z" be FIX'', Z"), lei the 
distribution function of X" for fixed values of Z" be F(X” [ Z"), and let the distribu¬ 
tion function of Z" be F(Z"). Then if A(X", Z”) is measurable with respect to 
FiX\ Z") and if 

f \ A(X\ zn \dF(X\ Z'^) < CO, 

it follows that 

f \AiX^,Z'')\dFiX”\Z'') < CO 

Jftjm 

for almost all^^ sets of values of Z”' and 

f A(Z", Z") dF(Z", Z") = f r f A(Z", Z")dF(r‘|Z'‘)ldF(Z”). 
•'SPnXITn JjTn L’'*'" J 

In Corollary I an important condition was that the maximum of the absolute 
values of the elements of Cn should approach zero as n increased. In order to 
obtain a similar condition when the elements of C„ are random variables, we 
shall define the function d(Cn) as follows: For each value of Z" let d(C„) be the 
maximum of the absolute values of the elements of C„ . We shall denote 
d(Cn) by dn . If the elements of C„ are Borel measurable functions then d„ is a 
Borel measurable function of Z”. Hence d„ is a random variable defined on Wn . 

A sequence of random variables di, d 2 , • • • is said to converge in probability 
to zero if, given t > 0, then 

lim P{jd„| > e) =0. 


If the sequence of functions dp , d^+i, converges in probability to zero we 
shall say that Z is true. 

If ^ is true, and if, for almost all values of Z" we have 


(2.6) 

f x„dF(X.,Z”) = 0, 


^/tP 

(2.7) 

f x„x„dF(X,, Z") = ffii 
Jst 


” Proofs of Fubini’s theorem with the required amount of generality will be found in 
[6, p. 101] and [14, p. 73], 

A proposition concerning random variables is said to be true for almost all values of 
the variables, if it is true for all values of the variables, except perhaps for a set of proba¬ 
bility zero with respect to the distribution function of the random variables 
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and the condition £ 3 , is satisfied with respect to the X and the distribution func¬ 
tions F(Xv, Z”) then we shall say that 3^^ is true. 

If 

( 2 . 8 ) y ' I j Z ') 

r J/il’XH'n 

then we shall say that (?° is true. It is noted that if and (2.7) are true, then 
C° is true if C is true for almost all sets of fixed values of Z". 

Corollary III. Lei C", 3 and be true. Then, if 3 is true, it follows that 

lim F{Yn) = n Niyiy, • • •, 2/pt 1 (O)- 

n“+« y 

Proof. It is necessary to show that the condition £pm is satisfied by the 
variables Cy„Xi, if the condition £, is satisfied by the variables x,, and that the 
condition 2 implies that lim = 0 when the x,yfn of Theorem I are sot equal 

n-^oo 

to the CytnXi, of Corollary III. 

If we let = £ icyy,^x,yf, and let = E {A°„), then, by (2 8 ), 

it is true that 

s* = <ri. = 2 It* • 

y.i » 

From and the fact that for sufficiently large n, | dniZ") ) < 1 for almost all 
Z" we have for any preaasigned « and 8, 


4 f A* dF(X", Z'‘) f mdU^n E x,ydF{Xy, Z’') < 8 


for sufficiently large n, since the set of a;’s and Z” for which E > «Sn con- 

^iV 

tains almost all the x’s and Z" for which A„ > es„ . Hence, the condition 
S.pm is satisfied by the random variables c.y,„x„ with respect to the distribution 
functions F{X,, Z"). 

We now show that 

lim [max .B[(cYi-nX.v)M] = 0 . 


It is clearly true that 

< f dlxldFiXy,Z’'). 

JitrXVn 


Since d„ converges in probability to zero, and since d* < 1 for almost all Z, 
we can, for any « > 0, take no so large that if n > no, then Pjd* > f«} < 

If E is the set on which dl > ^e, we then have for all n > no, using (2.7), 

/.[/.. xldF(X,lZ'‘)JdF(Z") 

+ [/^^®ldF(X,lZ")]dF(Z") < e<r.. 

and this mequality is also satisfied for all n > n#. 
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The following discussion is useful in obtaining the limiting distributions of 
statistics which occur in multivariate statistical analysis. 

The letter / will assume all integral values from 1 through s, the letters v 
will assume all integral values from 1 through ri/ , and the letters y, d will assume 
all integral values from 1 through m/, for any/. 

Let X{ , ■ be, for any fixed/, a sequence of random vectors of p/ compo¬ 
nents defined on , and let the .set of random variables • be independently 
distributed for any fixed /. 

If, for each set of values of rii, • • • , n,, (<„ is a function of • , n,), 


F{xl ,..., x;., , ZJ = nn1 Zi, ■ ■ •, ZJ.F(Z,, ...,ZJ, 


we shall say that is true. 

Let, for any fixed value of /, the matrix*'* Ci — 11 11 where the c^„„ are Borel 

measurable functions of Xj , (k < /), and*^ Z", have the same properties as 
C„ I and let d(Ci) be the same function of Ci that ci(C„) is of C„ . We shall 
denote d(Ci) liy di . 

Let 



y 


X 


/ 


and let Yi = 1 | H. 

For fixed/, the p/ rowed square matrix (v/), its inverse, and so on are defined 
as wore the .same functions of the <T,j earlier in this paragraph but with o-,,'/ 
replacing o-,,, where 

E{xU = 0 


and 


E{X%yXjy^ — ^* 7 /• 


If is true, and if for almost all values of Z" we have 

(2.9) f x{ydF(X{, Z”) = 0, 

JnPf 

(2.10) f xiyX^,ydF(Xi, Z”) = (T,,/, 

Jit”/ 

and the condition is satisfied with respect to the Xi and the distribution 
functions F(X{ , Z") then we shall say that 3Cpy is true. 

If 

(2.11) 2 f Cyy„Cly„xiyXiydF(Xi, Z") = <r„/Syl, 

r J 


The superscripts / and k will nQt indicate multiplication but will only be indices. 
“ See footnote 11. 
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then we shall say that & is true. It is noted that if and (2.10) are true 
then is true if (? is true for almost all sets of fixed values of Zi , .. , Z'Jr*, 
Z\ 

If dn converges in probability to zero as n increases we shall say that 2/ is 
true. 

CoEOLLAKY IV, Let Q\ and !Ifpj, • - • , Sffip, he true. Then, ij%i, • • ■ , 2, 
are true, it follows that 

lim = TLF{Y^), 

’ll!'•’inj-** / 

where 

F{Y‘) = 

The proof is almost identical with the proof of Corollary III of which this 
corollary is an extension. 

It is remarked that if the statistics, the limitmg distributions of which are 
desired, are associated with the normal distribution, as are most statistics 
studied, then Corollary IV may not be the best tool to use. This is a conse¬ 
quence of the fact that such statistics are generally expressible as functions of 
uncorrelated random variables and hence are more simply discussed, using 
Corollary I. 

3. Limiting distributions of quadratic and bilinear forms. We first assume 
the coefficients of the forms to be constants. For each .set of values of i, j, and 
n, the matrix of the bilinear form with coefficients which are real numbers, 

(3.1) h,j — ^ i , 

will be denoted by d „ , and the rank of j 4„ will be denoted by m. The maximum 
of the absolute values of the elements of A„ will be denoted by h„ . We shall 
assume that there exists an orthogonal transformation, 

^3.2) Vtun ~ ^ 1 Cp,.7,X,,, 

y 

oi x,i, ■ • , x,„ such that 

(3-3) ~ 'Eihiy^lnViln, 

$ 

where the coefficients \s are non-negative.^® 

Lemma I, If dn is the maximum of the aisolute values of the elements c^vn 
then a necessary and sufficient condition that lim = 0 is lim d„ = 0. 


Our theorems will not be applicable if gome of the Xj are negative and some are positive. 
However if all the Xj are non-positive then the theorems will remain true. 



LIMITING DISTHIBUTIONB 


135 


Proof From (3.1) it follows that 

^fivn (jhvTi ■ 

h 

Hence, hn ^ \csiin and | a,,v„ | < dn Xj). The remainder of the proof 

& 

is obvious. 

The following theorem will be the basis for a large sample analogue of Wis- 
hart’s distribution. 

Theorem III. Let 'DCpbe true. Then, a sufficient condition that 
hm F(Yn) = II Niyiy ,■■■ ,ypy-, (<r)), 

n-^oo y 

where b,, = 2 '>^tyunynn is lim b„ = 0. 

6 n“*« 

Proof. According to Lemma I, the fact that lim b,, = 0, implies that 
lim dn = 0. The 2/ti.n are such that 6 is true. Hence the hypotheses of Corel- 

fl—*00 

lary I are satisfied and the theorem is proved 
Before stating the corollary to Theorem III, we shall prove an obvious lemma 
which is of constant service. 

Lemma II. Let lim F{Xn) = F'(Z) at all points of continuity of F{X), and let 

n^oo 


Qln {Jl(,SCln j * , S^pit), * ‘ * t ffkn “ j * ' j 

he Borel measurable functions of their indicated variables for each value of n, 
ip ^ fc)) defined on R”. 

Then 

lim Figin , • • • , fft„) = F{gi , ... ,(70 


at all points of continuity of F{gi , gk), where ga = ga{xi, ■ ■ , Xp ). 

Proof. By (2.1), we have 

(3.4) •••.!,„>] _ ^ 

where since ga{xi, • • ■ , Xp) is a, Borel measurable function of ti , • • , Xp we 
know that gi ,,, ■ ■ • , gu have a joint distribution function F{gin , ■ ■ ■ , Okn). 
Then, since lim F{X„) = FiX) at all points of continuity of F{X) we have” 

n-*eo 


uniformly in every k , • • • ,tp interval since 

^ S\dF„iXi,...,Xp) -FiXi, ...,Xp)\, 


See Cramer, [3, p. 30] and “Additional Note” at the end of the book 
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where Fn(Xi, • , Xp) stands for FiXi^ , ■■■ , Xpn), when X, and have 

the same numerical values. If follows from (3.4), that 

lim 

n“*oo 

uniformly in every • ,tp interval, and consequently 
lim F{gi„, ... , gk„) = F{gi, ■ • ■ , gt) 

n 

at all points of continuity of F(gi, • - , gC). 

The real valued function Gdix‘, n, c) will be defined by the equations 

Gt{0', 0, c) = 1, (— CO < c < oo), 

Od{x; n, c) = lr(^n)r^ (2 c)“*'‘’ (0 < a: < “ ; c > 0; n > 0), 

and Gj(x; n, c) = 0 otherwise. The function 0(x; n, c) will bo defined by the 
equation 

G(x; n, c) = I Gd(t; n, c) dt. 

■ •'0 

The real valued function Gjixu, Xu, • ■ • , Xpp ; n, (cr)) will be defined by tho 
equations 

Gj(0, - ,0;p 1, (o-)) = 1 

Gd(xn, •. ■, Xppi 7i; (a)) = (2^)-*”^'’-“ • [H n{n-i+l)V-\x 

\ 

•exp [—i (0 < x„ < «>; xl-, < XuX „); (v) is positive definite, 

i)7 

where | a: | is the determinant | Xi, \ and Gd{xii , • • ,Xpp\ n, (a)) = 0 otherwise. 
The function G(xn , • • • , Xpp ; n, (o-)) will be defined by the equation 

G(Xii , ■ • ■ , Xpp ; n, (a)) = I ■ ’ ■ 1 Gd(tu, • • • , tpp n, (<t)) dtudtu • • • dtp ,,, 
J-QO J—oO 

We can now state the limiting distribution analogue of Wishart’s distribution. 
Corollary V. If is true, if Xi = 1, avd if m > p then 

lim F(bii, bi 2 , , bpp) = G{bu, • • • , bpp , m, (o-)). 

n-^oo 

pROOR. The conditions of Theorem III and Lemma II are satisfied. 
Obviously for fixed i, the limiting distribution of b”i is 0(b; m, o-„), and if 
i ^ j, the limiting distribution of h^j/m is the distribution of the covariance of 
X, and X, in a sample of m independent pairs of observations.^® 


*“ See Wishart and Bartlett, [1, p. 2661. 
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We proceed to the analogue for limiting distributions of one of our generaliza¬ 
tions of the Fisher-Cochran theorem. It is first desirable to give some addi¬ 
tional definitions. 

We consider the bilinear forms 


(3.6) 


7 n X ^ a 

0\ja /^ 


with real coefficients, and we denote the matrix of Kja by An . The rank of 
An is , and the rank of An is mtn . If the maximum of the absolute values 
of the elements of AJ, , • ■ , At~^ is b„ , and if there exists an orthogonal trans¬ 
formation, 

(3.6) Vtlin ~ CfifnXit f 

P 


oixa , ■ • , Xtn such that 

a 

where 5 assumes all integral values from mi + •. -h ma_i + 1 through 
mi + • - • A- ma and h is non-negative, then it is easy to piove, as in Lemma I, 
that a necessary and sufficient condition that lim = 0 is lim dn = 0, where 

n—►» n-»oo 

dn is the maximum of the absolute values of the elements c„^n ■ 

Lemma III. Let m = mi + • • • + mh-i and let 

(3.7) E = Z 

a V 

Then, a necessary and sufficient condition that 

bi,a ~ ^ ) ytinVjin , 

a 

where the real linear functions, y,sn , of x,i ,• ■ , x,„ are given by (3,6), the linear 
functions (3.6) not now being assumed to be orthogonal, is 


mkn = n — m. 

Furthermore, the functions (3.6) are orthogonal. 

The proof of this lemma for the case p = 1 is given in [16] The procedure 
to follow in extending the lemma to the cases where p > 1, is given m [15, p. 
473]. It is noted that this lemma is more general than the lemma in [15] 
inasmuch we we show that the orthogonality of the transformation is a conse¬ 
quence of our hypotheses and not one of the hypotheses 


'' It IS noted, however, that the increase m generality affects only the necessity not 
the sufficiency of the theorem. 
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TiiEOKiiM IV, Lcl 'Xp I (3.7) and (3.8) be true for all values of n, and suppose 
that lim h„ = 0, Then 

lim F{yn) = H N(yit, ■■■ ,yvy", (tr)), 

n-*to y 


wHgTS yUnyjdn » 

6 

The proof is omitted 

CoROLLABY VI. If the hypotheses of Theorem IV are assumed^ and if m^> p; 
{p = 1, ■. ■ , A; then 

lim • j hppfi) 2 /ia+iji j * * * > 2 /ii»w) 

n—>00 

h VI 

— JX G{hiiy 1 ' • • j bppy I rn-f, (<t)) • XI ^(yiy j * ■ ■ j Upy > (o'))* 

•y*il 7-”;i+l 

If 73 = 1 m Theorem IV and Corollary VI, we have the large sample analogue 
of the Fisher-Cochran theorem. 

We now discuss limiting distributions of random variables which are bilinear 
and quadratic forms in one set of chance variables for fixed values of other ran¬ 
dom variables. We consider the coefficients and o“„„ of and h",a to be 
random variables. Hence the matrices ^In and An are random matrices. 

To be more explicit, let Zi, Zi , • • * be a sequence of random vectors, the 
random vector Xi having p/ components x{n, • , , and being defined on 

R"^. The set of random vectors X{ and Zi, • • , Zi^ will be assumed to be 
independent. 

For each value of f the coefficient.s of the bilinear forms 

n/ 

( 3 . 9 ) bija/ “ (^; .7 “ Ij ''' I Pi ; n! = 1, • • ■ j /c/) 

fl, 

will be assumed to be Borel measurable functions of the random vectors 
, Xi~'^ and Zi, •. , Zt„ 

The matrix of 6 ,Ya/ is denoted by . The rank of An, is ms, and the rank 
of An'l is mkfn, for all sets of values of the Upiai except, perhaps, on a set En, 
which IS such that lim P(En,) — 0 . 

n y->ac 

Let the function b{An,) be defined as follows: 

For each set of values of the Xp and Z let b{An,) be the maximum of the abso¬ 
lute values of the elements of A^J,. We shall denote b{An,) by b^n ,. Obviously, 
bn, is a Borel measurable function of Xp and Z. Hence 

bi', = b{A%) 

is a random variable defined on W X fl"'*'*"'"' 
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For each value of /, and for almost all sets of fixed values of the , (h = 
1; • • ) / “ I)) shall assume that there exists an orthogonal transformation, 


(3.10) 

of xii, 

(3.11) 


VxiiTIf — 2 


•ftvn/ *' 1 .V y 


, xL, such that’"’ 


bi/af = 2 


jXn/ I 


where ^ assumes all integral values from to./ + ■. + to^-i / 4- 1 through 
Wj/ + • • ■ + wio/ . The coefficients of the linear forms (3.10) are real 
single valued Borel measurable functions of the coefficients aiy^f of the bilinear 
forms (3.9) for fixed values of the Zj and Z" Let be the same function 
of the functions that cf 


fi ctiiiLi ^ jjcu Lfivn/ 

ptn/ is of the coefficients of the bilinear forms having 


constant coefficients. Furthermore, let di, be the same function of the matrix 
= ■ 

'-'n/ 


^fivn/ \ 

Lemma IV. 


where to = TOi/ + 

A necessary and sufficient condition that 


+ / , that h“,/j is of 


a/ 


converge in probability 


to zero as n increases is that converge in probability to zero as n increases 


Proof. Since 


we have 


kf-i 


B-l X 


►n/ , 


{kf-l)bi, > [cU/ 

fl-i 


and 


!</„/1 < {L [cLjf-Z [cL,]’’)' < 


where h assumes all integral values from TOi/ + ... + to„_i / + 1 through 
mif + • ■ ■ + niaj . The remainder of the proof is obvious. 

In proving Theorem V we shall use a generalization of Lemma III which is 
proved in [15, p. 473]. 

Theorem V. Let fK’],; • ■ • SITp,, be true, and suppose that 

2 bV,af = ^ xWjy , 

a 

Then, if bi^ converges in probability to zero as n increases and if to,/ = nj — inK,n, 
for all values of n/ , it follows that 

lim F(j/ii„, I ••• I y— H Niyiy , ■ • •, y{ip I (a ))• 

«ii’ ‘ •in|-+oo / 

The proof is omitted. 


It is not necesBiiry tliat the Xj be set equal to one as in (3,11) It is only somewhat 
easier to state the results 
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CoEOLLABY VII. If tYiaj > p/, then 

lim F{bnii, ») = II fl G'C&iw , ■ ■ • , i %/ , (/)). 

nil’* iWa"*®® ^ 

The proof is omitfed. 

Finally, let us assume that the vectors X{ , for fixed v are uncorrelated and 
for fixed / are independent. By that, we shall moan that E{x{,x%) = 
and that for all n the set of random vectors Xi are independent for the same or 
different superscripts providing the subscripts are all different. Let us also 
assume that the coefficients of the forms (3.9) are real numbers. Thus we have 
weakened the hypotheses of Theorem V concerning the random vectors, and wc 
have strengthened the hypoihese.s of Theorem V concernmg the forms (3.9). 
Inasmuch as we are generally concerned with the limiting distributions of 
statistics which occur in the analysis of the normal distribution, and many such 
statistics have been shown to be invariant under transfoi-mations into uncor¬ 
related random variables,''' Theoiem VI and Corollary VIII will often be 
applicable, 

Theoeem VI. The statement of Theorem V is repeated. 

CoHOLLARY VIII. The statement of CoroUarij VII is repeated 
Another extension of these theorems may be obtained by allowing all the 
n/ to be equal, i.e. ni = • = = n, and by putting conditions on the forms 

(3.9) which enable us to say that for fixed i,f, n and n, the set of random variables 
e^vriei, are independently distiibuted Theoiem I could then Vie used to obtain 
a very general lesull. However, except for the case dealt with above, the con¬ 
dition of independence appears to be rather restrictive, and the theorem is 
omitted. 

4. Applications. We first state the strong law of large numbers and a 
lemma which is very useful in the discimsion of limiting distribution.s. 

A sequence of random vaiiables Xi, ■ will be said to converge with prob¬ 

ability one*’' to a random variable X if 

limPllZ, -X| < €, \X„+i-X \ < *, ...,\Xn+^-X \ < e} = 1 

for every value of p > 0, uniformly in p for every positive number e. Upon 
setting p = 1, it is seen that convergence with probability one implies con¬ 
vergence in probability. 

The strong law of large numbers"’ asserts that if the independent random 
variables X, Xi , ■ ■ all have the same distribution function, and if E(X) is 

finite, then the sequence of arithmetic means I 2 converges with proba- 

71 

bility one to E{X). 


The regression transformation which yields the unoorrelatcd variables will be found 
in [15, p 470, (3.2)1 

*2 See Doob [4, p, 163], and Freehet, [9, p. 228] 

See Doob [4, p, 163], and Freehet, [9, p. 259], A complete proof is given by Prechet. 
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Hence, if E{x,r) — 0 and if tr,; is finite, then - ^ Xi^x,, = s(,„ converges with 

_ 71 p 

probability one to v., . Since (x,, - Xin){x„ - x,„) = Y 

V V 

where xu is the arithmetic mean of a:,i, • • • , , and since £■,„ converges with 

probability one to zero, it follows that s,,„ = converges with 

probability one to o-,, . It is, of course, assumed that the random variables 
Xiv, have the same joint distribution function for all values of v, and that 
the random vectors , are independently distributed The process of the 
reduction of s,,„ to Si,n in the limit, is an example of the possible uses of. 

Lemma V. If ip{ti , ■ ■ ,tp) a continuous function of ti, ■ ■ ,ip, and if the 
sequence of random variables x^n converges in probability, (with probability one) to 
X, which may be a random variable or a constant, then the sequence of random 
variables ip{xip , • , Xpn) converges in pi-obability (with probability one) to 

ip(xi, ■ ■ • , Xp), where some or all of the x’s may be constants. If Xi, ■ ■ ■ , Xp are 
constants then ip(ti, ■ , tp) need only be continuous in the neighborhood of 

xi, • ■ • ,Xp and Borel measurable. 

For a proof of part of this lemma which may be extended to yield the entire 
proof, see, Frechet, [9, p. 178]. 

Using Lemma V it is easy to see that the coefficients r„ of least squares 
equations converge with probability one to their /3 values, where the value 
IS obtained by substituting <r,, for s„„ in the expression for r„ assuming, of 
course, independent random vectors which have the same distribution functions. 

Since problems in the analysis of variance may be interpreted as problems in 
least squares the above comments and Lemma V will generally make it possible, 
when determining limiting distributions, to consider the statistics to be func¬ 
tions of deviations from “true” mean functions rather than “sample” mean 
functions. 

We shall discuss, briefly, four applications of these results 

(a). The limiting distribution of the regression coefficient. Let r„ , the “sample” 
regression coefficient, be defined by the equation 



where x„ and x,,, are deviations from arithmetic means. If the random vectors 
(a:,'v, ijp) are independently distributed for fixed i, j, with the same distribution 
functions, and if E(xiy) = E(x,y) = 0, E(xi,x,y) = o-,,, then it follows from the 
strong law of large numbers that Y Xt,x,,ln converges to v,, with probability 


one, and from the Laplace-Liapounoff theorem that Y x„x,yj\/n has a normal 
limiting distribution with mean and variance E[x„x,y — o-,,) ). Hence, by 


Lemma V, 


and variance lim E<n 


\/n ^ has a r 

e lim Elnfrp - ) 

n-^oQ \ O’ii/ ) 


has a normal limiting distribution with mean zero 


unless that limit does not exist 
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If the x„ are not random variables then, in order to apply Corollary 1 with 
p = 1, it IS necessary that 


(4.1) 


ILm 

n-*«i 




(S xiy 


= 0 . 


In that case, the limiting distribution of xt) * r„ is normal with zero mean 

and variance c,, . If (4.1) is not satisfied then there is no assurance, unless 
the x,y are normally distributed, that the limiting distribution of aj?v)V„ 

V 

is normal. 

(b). The limiiing distribution of the analysis of variance ratio The tests of 
significance which occur in the analysis of variance depend on the ratio of two 
quadratic forms, qu and qin, the denominator qin having rank (or degrees of 
freedom) »ij„ increasing with n, and the numerator gin having rank mi not 
changing with n, i.e.. 


qin 

Dn = - , 

g2»mi 

where gi„ + q^n + gsn = £ and gsn is a quadratic form of rank nisn which 

V 

will be identically zero if n = mj + mn . Since^^ 52 n is expressible as the 
variance of x about a least squares equation it follows from the previous dis¬ 
cussion and Lemma IV that — converges with probability one to o' under the 

W2n 

assumptions that the x, are independently distributed with zero means and 
variances x. Hence the limiting distribution of Vn will depend only on the 
limiting distnbution of gi„ and it will consequently be necessary to consider 
only the matrix of gi„ , in order to apply Corollary VI with p = \. For ex¬ 
ample,**® if there are pn independently distributed random variables a;,, with 
zero means and variances arranged in p blocks of n random variables each, 
then 

^) U (^Ti Xji) “b y ) (x^u Xin) , 

*.*’ » 1,1. 

where is the arithmetic mean of x,t, • , x,n and x„ is the arithmetic mean 

of all the x„ . Then 


gin n (^n Xn) , 

i 

qin ^ V iXtr ^»n) f 


i.y 


mi = p — 1 , 

min = pin — 1) 


“ This has been proved by Kolodziejczyk, [12, p 161] 
Other Bohemes are given in Fisher, [8] 
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and the matrix of may be obtained by sub&titutiiig for the and . In 
this case it is sufficient to express qu as 2 a*,<5.5, where S, = T, = 

ill V 

ip ~ fi'^^3) ®i3 — to see that the condition that the 

maximum of the absolute values of the elements of the matiix of g-m approaches 
zero as n increases. Hence, if the x„ satisfy the condition the limiting 
distribution of miii„ is G(v, p — 1, 1). 

Clearly, if only the rank of qsn increases as n increases, the rank m^n of qu 
being constant and if the maximum of the absolute values of the elements of 
the matrix of gj,, also approaches zero as n increases, then «„ will have a limiting 
distribution which is the analysis of variance distribution, and the limiting 

distribution of —~— will be the correlation ratio distribution 
gin + g2n 

(c) . Periodogram analysis. Wc need only remark that the linear functions 
which are used in the analysis of the Schuster periodogram^ meet all the require¬ 
ments of Corollary I if the x, are independently distributed with zero means and 
constant variances and satisfy the condition £. Consequently the large sample 
theory of the Schuster periodogram is the same for non-normal as it is for 
normal distributions. 

(d) . Multivariate analysis We shall assume that the random vectors 

Xi, , (X has components Xu , • ■ , x,,), are independently distributed, that 

(2.3) and (2.4) aro satisfied, and that the condition i?, is .satisfied. For any 
fixed n and a we shall call the determinant D” of the forms (3.5) a generalized 
sum of squares, and the determinant F" of the elements b," Jnia a generalized 
variance. Wc shall say that and have rank and that Z)" and Vk 
have rank . If is constant, and if (3.7) and (3 8) are true then clearly 
the limiting distribution of Dp is the distribution of the generalized variance 
of mp vector observations” from a normal di.stribution, with zero means and 
covariance parameters o-.y. Under the same conditions, the limiting distri¬ 
bution of D^/Vk is the distribution of the generalized variance of vector 
observations from a normal distribution with zero means and covariance pa¬ 
rameters bij. Many other similar limiting distributions are immediately 
derivable. 

Before completing our discussion of the limiting distributions of statistics 
occurring in multivariate analysis, we .shall state a theorem on limiting distri¬ 
butions which is an obvious generalization of a theorem of Doob, [4, p. 166]. 

Suppose that the random variables g(n)Zi„, ■ , g{n)Xp„ have a distribution 

function P{g{n)Xin , • ■ • i g(n)Xpa) which is such that 

lim F(gin)Xu , • ■ •, gi‘n)Xp„) - FiXi , ■ • ■, Xp), 

where FiXi , ■ • • , Xp) is a continuous distribution function, and suppose that 
X^n converges in probability to the real number f.. For example, if Xn = 


“The theory of the Schustei periodogram is given by Fisher [7], 
” See Wilks, [18, p. 476] or Madow, 115, pp. 481, 484) 
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2 Xt/n where E{x,) = 0, E{x^^ = 1, and £ is satisfied, then Xn converges to 

V 

zero with probability one, and Vnin has a limiting distribution which is 
normal with zero mean and unit variance, l.e. 

lim \P['\/nin < *) — NiX] 1) 1 = 0. 

n-^« 


Theobem VII. Let , • ■ ■ , tp) be a function of k ■ ,ij, defined in a 
neighborhood N of •• Xp which, together with its (k; + l)-th partial deriva¬ 
tives is continuous in N. Suppose that k is the least value of rj such that the 
random variahles^^ 

have a joint limiting distribution function Dizi,--- ,x,). Then the random 
variables [g{n)f’[<pj{xin , ■ , Xpn) — <pj{ki, ) Ip)] have a pint limiting distri¬ 

bution which is given by D{xi, ■ , x,). The value kf is greater than or equal,to 
the minimum value for which not all the partial derivatives of order h; vanish at 
li) ‘ ' I Ip 

The proof is almost word for word that of Doob, the only difference being 
the removal of the specializing words. 

We now consider the limiting distribution of the ratio of geneialized sums of 
squares X* which is defined by 


Ip = 


Dtlrl 


where X?+i is the determinant of the forms bt,k + brji = M-i • 
shown that“® 


L 

where YZ , ij = k, k + 1), is a ratio of generalized sums of squares 

(r, s = 1, ..., i; u, D = 1, ..., z - 1; b?oj = 1). 


V" _ I I 

Jt i, — - 


\bZ 


uvi 1 


Since Yt,/m,„ converges with the probability one to | (r„ 1/| o-„v |, and since, 
by Corollarv VIII the joint limiting distribution of the mt+i» 


See Goursat-Hedriok, [10, p 107] for a statement of the Taylor expansion of functions 
of several variables, which wc Use here, by ~is meant the value of 

5if/{x, , ,Xp) 


3b 


dx, 


at the point b , .. , (p . 


*' See Madow, [15, p. 485]. 
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II G{Xt ; m, 1) it follows, by Theorem VII, that the joint limiting distribution of 

t 

the ratios of generalized sums of squares 

Ylk 


is 

II Gix ,; iirti , 1) 

X 

and that the liiniting distribution of wia+i „(1 — is'*" 

G(x‘y pmi , 1 ). 

In a following paper, these results will be extended to quadratic forms in 
non-central random variables. 

6. Summary. In Section 2, Theorem I, we stated a very general foim of the 
Laplace-Liapounoff theorem based on the Lindeberg condition. In four corol¬ 
laries, this theorem was shown to provide joint limiting distributions for sys¬ 
tems of linear forms which are such that the maximum of the absolute values 
of their coefficients converge to zero with an increase in the size of the sample 
if the coefficients are constants, and converge in probability to zero with an 
inereasc in the size of the sample if the coefficients are themselves random 
variables. It was shown that under certain conditions functions of several 
random variables, which arc such that each function is a linear function of 
certain random variables for fixed values of random variables of lower index, 
also have a noimal multivariate linutuig distribution. 

These results were extended to include limiting distributions of quadratic 
and bilinear forms in Section 3 The method of extension was to show that 
necessary and sufficient conditions for the existence of systems of linear forms 
satisfying the conditions of Section 2 arc provided by rather simple conditions, 
the most important of which is that the greatest of the absolute values of the 
elements of the matrices of the quadratic and bilinear forms approach zero if 
the size of the sample increases, the ranks of the forms remaining unaltered. 
This led to the theorem that quadratic and bilinear forms having such ma¬ 
trices have x^ or covariance, or Wishart’s distribution as limiting distributions. 
It was then shown, in Theorem IV, that if the rank of the sum of the matrices 
of the quadratic and bilinear forms is equal to the sum of the ranks of the ma¬ 
trices, and if certain of these ranks do not change as the size of the sample 
mcieases, then the system of quadratic and bilinear forms have Wishart’s 
distribution in the limit provided the other conditions arc met. These results 

A generalization of Wilks’ result, [19, p. 323] to the case where the variates aie not 
assumed to have a normal multivariate distribution may readily be obtained. 


n 

h-l 
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were then extended in Theorem V to one of the cases occurring when the coeffi¬ 
cients of the forms are themselves random variables. 

Several simple illustrations of the uses of the methods were given in Section 4. 
It was shown that the analysis of the variance ratios, and statistics occurring 
in the theory of multivariate statistical analysis have the same limiting distri¬ 
butions which they would have had if their variables had been normally and 
independently distributed. 
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ON A TEST WHETHER TWO SAMPLES ARE FROM THE SAME 

POPULATION^ 

By a, Wald® and J. Wolfowitz 

1, The Problem.® Let X and Y be two independent .stochastic variables 
about whose cumulative distribution functions nothing is known except that 
they are continuous. Let xi, xi, - • ,a;™ be a set of m independent observa¬ 
tions on X and let i/i, • • , be a set of n independent observations on Y. It 
is desired to test the hypothesis (the null hypothesis) that the distribution 
functions of X and Y are identical. 

An important step in statistical theory was made when “Student" proposed 
his ratio of mean to standard deviation for a similar purpose. In the problem 
treated by “Student” the distribution functions were assumed to be of known 
(normal) form and completely specified by two parameters. It is clear that in 
the problem to be considered here the distrilnitions cannot be .specified by any 
finite number of parameters. 

It might nevertheless be argued that by virtue of the limit theorems of 
probability theory, “Student’s” ratio might be used in our problem for large 
samples. Such a procedure is open to very serious objections The popula¬ 
tion distributions may be of such form (e.g., Cauchy distribution) that the limit 
theorems do not apply. Furthermore, the distributions of X and Y may be 
radically different and yet have the same first two moments; clearly “Student’s" 
ratio will not distinguish between two such distributions 

The Pearson contingency coefficient is a useful teat specifically designed for 
the problem we are discussing here, but one which also possesses some disad¬ 
vantages, The location of the class intervals is to a considerable extent arbi¬ 
trary. In order to use the distribution, the numbers in each class interval 
must not be small; often this can be done only by having large class intervals, 
thus entailing a loss of information. 

2. Preliminary remarks. Denote by P{X < x] the probability of the relar 
tion in braces. Let f{x) and g{x) be the distribution functions of X and Y 
respectively; e.g., P|X < x) = /(x). Throughout this paper we shall assume 
that/(x) and g(x) are continuous. 

Let the set of w -f- n elements xi, • • • , x^ and yi, • ■ , 2 /n be arranged in 


^ Presented to the Institute of Mnthem.aticnl Statistics at Philadelphia, December 27, 
1939 

* Research undei a grant-in-aid from the Carnegie Corporation of New York 

* The authors are indebted to Prof. S, S. Wilks for proposing this problem to them. 
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ascetwling order of magnitude, and let the sequence be designated by thus: 
Z = zi, Si, • • , z,„+n, where Zi < Zi < • < 3*+,.. (f(x) and ff(x) were 

assumed to be continuous. Hence the probability is 0 that 2 , = 2 ,+i and there¬ 
fore we may exclude this case) Let V = Vi, Vi, • • ■ , Wm+n be a sequence de¬ 
fined as follows u, = 0 if 2, is a member of the set Xi, ■ ,Xm and y, = 1 if Zi 
is a member of the set yi, ■ , . It is easy to show that any statistic S 

used to test the null hypothesis should be invariant under any continuous, 
reciprocally one-to-one transformation of the real axis That is to say, if 
t' = <p{t) is any such transformation, then 

(1) S(xi , . ■. , , 1 / 1 , • • , i/n) = Si<p(xi), • , ^(l/i), - • ■ , (p{yn))- 

The reason for this requirement on S is the fact that the transformed stochastic 
variables X' = <p{X) and Y' = ip{Y) are continuous and have identical distribu¬ 
tions if and only if X and Y have identical distributions. Hence S must be 
a function of V only, with the added restriction that S(7) = /S(y')i where 
V = Um+ 7 ., Wm+n-i, • • , Wi ■ For if S were a function of xj,, ■ ■ , , 
Vi, • • ■ ,iln which cannot be expressed as a function of V alone, then there 
exists a continuous reciprocally one-to-one transformation t' = (p(t) such that 
(1) is not true. On the other hand, any continuous reciprocally one-to-one 
transformation of the entire line into itself is monotonic and hence either leaves V 
invariant or else transforms it into V 

3, Previous results. In an interesting paper on this problem W. R. Thompson 

[1] proceeds as follows. Let the sets xi, •• ,Xm and yi, ■ • , //« be ordered in 
ascending order of magnitude, thus: Xp,, Xpj, •. , Xp^ and yp[ , VA , ■ ■ ■ tVvk 
whereXpi < Xpj < • • • < Xp„ andi/p; < j/pj < • • • <yA- Let P{xp^ < i/p/j>) 
denote the prohabihty of the relation in braces under the null hypothesis (/(x) = 
p(x)). This probability is shown to be independent of /(x) and the relation 

(2) R{xpj yp'^'] — ^{th, n, k, k') 

holds, where the right member, which is given explicitly by Thompson, is a 
function only of the arguments exhibited. To make a test of the null hypothesis 
with, say, a 5% level of significance, this writer proposes to choose k and k' 
so that \p{m, n, k, k') = .05. The test would then consist of noticing whether 
< yp’k' O’: ’lot- I” the former case the null hypothesis is to be considered 
as disproved. 

It is clear that this test cannot be very efficient, ignoring as it does so many 
of the relations among the observations. Except under certain rather narrow 
restrictions on the admissible alternatives, for example, that p(x) = /(x -f- c), 
where c is an arbitrary constant, the test suffers the further defect of not being 
“consistent” in a way which will be discussed below. Hence the test suggested 
by Thompson can scarcely be regarded as a satisfactory solution of the problem. 
This criticism, of course, does not apply to those sections of Thompson’s paper 
which deal with the question of estimating the so-called normal range 
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4. The statistic V. A subsequence , • ■ 

also be 1 ) will be called a “run” if »,+i = = 

when s > 0 and if D,+r 5 ^ p,+r+i when s + r < m 


, Va+r of V (where r may 
■ ■ = Va+r and if t), 7 ^ v,+i 
+ n. For example, V = 


1 , 0, 0, 1, 1, 0 contains the following runs; 1 ; 0, 0; 1, 1; 0 The statistic"' U 
defined as the number of runs in V seems a suitable statistic for testing the 
hypothesis that /(x) = g{x). In the event that the latter identity holds, the 
distribution of U is independent of f(x). A difference between /(x) and gix) 
tends to decrease U. U is consistent in a sense which will be discussed below. 

In order to derive the distribution of U under the null hypothesis, we first 

( “I” • (I = "‘"'''‘Cm) possible sequences V have the same 


note that all the 


probability 


mini 
/ min! \ 
\ (m + n) \)‘ 


(t = 1 , 2 , • ■ • , m) and «,• = 1 (f = m + 1 , m + 2 , 
probability of the sequence is 


To see this, consider the sequence V where p, = 0 

• , m -|- n). Clearly the 


_ ' m(ffl — 1) • • l-n(n — 1) ■ ■ ■ 1 _ 

® (w + n)(m + n — 1) • • • (n + l)n(n — 1) • • • 1’ 

Furthermore, the probability of any other sequence is equal to the product of 
the factors in the numerator of q taken in a different order, divided by the 
product of the factors in the denominator taken in the same order. The quo¬ 
tient is, of course, = q. 

Let eo be the number of runs in V whose elements are 0 and let ei be the 
number of runs whose elements arc 1. Obviously U = + ei. Let the runs 

of each kind be arranged in the ascending order of the indices of the u,. Let Tq, 
be the number of elements 0 in the run of that kind [j =1,2,- , eo) and 

let ri,i be the number of elements 1 in the/*'*run of that kind (/ = 1 , 2 , ■ •, ei). 
The following relations obviously hold: 


00 


( 3 ) 

22 ft} = m, 


3-1 


01 

( 4 ) 

22 fiy = n, 

,'=i 

( 5 ) 

1 < eo :< m, 1 < 

(6) 

1 Co - ei 1 < 1. 


‘When this paper was already in proof, out attention was called to a paper by W. L. 
Stevens, entitled “Distribution of groups in a sequence of alternatives,” Annals of Eu¬ 
genics, Vol. 9 (1939). There a statistic, which is essentially the V statistic, 13 proposed 
for a problem different from that considered by us and the distribution of U is obtained 
in a different manner. However, the application of the U statistic for the purpose herein 
described, the proof of consistency and the other results of our paper are not contained 
in it. 
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Hence if C/ = 2k, then eo = ei = k, and i£ U = 2k — 1, then either Co = k, 
ei = k — 1 01 - = k — 1, ei = k The element Vi of V together with the num¬ 

bers I’d, rD 2 , ■ • ■ , ro ,^, rii, ri 2 , ■ ■ • , ri.i, completely determines the sequence V 
whose probability is q. 

Without loss of generality we may assume that m < n. If U = 2k, 
1 < k < m, vi = 0, any two sequences of k positive numbers each may consti¬ 
tute a sequence of roi, • ■ j ra,„, ni, ■ ■ , rje, provided only that (3) and (4) 
are satisfied. The number of sequences rm , m, ■ ■ , rot which satisfy (3) is 
the coefficient of a" in the purely formal expansion of 

(o -j- +•••)* = — 0 


and hence is . Similarly the number of sequences rn , ri 2 , • ■ ■ , rj* 

which satisfy (4) is found to be . Bearing in mind the case U = 2k, 

Vi = 1, we obtain 


( 7 ) 


P{U = 2k} 




(fc = 1, 2, ... ,m), 


where the left member denotes the probability of the relation in braces under 
the null hypothesis. In a similar manner we obtain 


( 8 ) 


P = ([/ = 2fc - 1) = 


/m—lfi n—lf< I m-ln n—lfi \ 


{k = 2, ■ ■ . , 971 -f 1), 


with the proviso that “Cs = 0 if a < 5. 

We shall now briefly indicate a method of obtaining the mean E{1!) and 
variance ir“([7) of V. For example, E{U) may be obtained by performing 
several summations of the type 

m—1 

(9) 

1-0 

It is easy to verify that the expression (9) is the term free of a in the purely 
formal expansion in a of; 

(10) (m - 1) .(1 + a)”-^o.^l -h ly] 


and hence is 

( 11 ) 


(971 - 1) . 
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The other summations required for the mean and variance can be carried out 
in a similar manner. We shall omit these tedious calculations. The results are; 


( 12 ) 


EiU) = 


2mn 

m + 


1, 


(13) 


__ 2m,ni2mn — m — n) 

^ (m + ny(m + n — 1)' 


The critical region for testing the null hypothesis on a level of significance jS 
is given by the inequality U < Uo, where t<o is a function of m and n such that 
P{U < Uo} = /3. 


6. The asymptotic distribution of U. Let m/n = a, a positive constant. 
Then, as m —> °o, 

2m 


E(U) 


AU) 


1 + a’ 
iam 


Theorem I. If i is any real number, the prohahiUty of the relation 

IJ < —u 2 ‘A'' V, I converges uniformly in t to 

1 + a L(1 + a)^J 




•\/^ J— 




dw 


as m^ 00 . 

The proof of this theorem is essentially the same as the classical proof that 
the binomial law converges to the normal distribution (see, for example, Fr4chet 
[2], p. 89) and it will be unnecessary to give the details. Since the asymptotic 
distribution of the subpopulation of even U is the same as that of odd U, it 
will be sufficient to consider only the right member of (7). Let m' = m — 1, 
n' = n - 1, and k' = k — 1. We make the substitution 


k' - 


m 


(14) 


w = 


1 + a' 


y/m' 


where a! = 


m 


(16) 



and evaluate the factorials by Stirling’s formula. We shall give here only the 
results of successive simplifications. At each step we shall omit the factors 
free of k or w, since their product may be reconstructed from the final expo¬ 
nential form. Thus instead of the right member of (7) we can consider the 
expression: 

(16) 
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, we get 


Omitting factors free of k, 

(it - 1) 1 (m - &)! (iir -1) I (n -~^1 
and by Stirling’s formula, since k and m are both large: 

Now apply (14). We obtain 


(19) 


y 


m'a' ' 

V .— tn'ot^ 1 

1 + a', 

) 

m' 

\ ^ m' 1 

l+a')-2 

(1 + a'j 

)) 

m' 
a'(l + 

—, respectively. 


7t\/ 

Dividing inside the parentheses by jp-p—,, 
and again omitting factors free of w, we get 

, . V VW / V a'Vw' / 

W , U)* 

Taking logarithms, expanding in powers of 8 'iid neglecting terms in 
and higher orders, the results are 

/ ,- mfry' ^\/(^ 4- „'V/! n -L «'V 


-^vW 


/\ 2m' / 

I 

1 + «' 2j\ a'V^ 2a'*m' J 

„ Wi' _ iVa'(l + a')u) o£'‘“(l + a')®'W)“\ 

a'(l + a') 2 A 2 m' ; 

n . n 


which equals 

(22) + o(m'-»). 

2a 

The proof of the fact that the distribution of w converges uniformly to tl 

normal distribution with zero mean and variance 75 —^can be carried o 

(1 + ay 

in the same way as the classical proof that the binomial law converges to t 


in the same way as 
normal distribution, 
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It is obvious that 


has the same distribution as w. From this and from the fact that U = 2k or 
2k — I Theorem I follows. 

In using conventional tables of the Gaussian function to make testa of sig¬ 
nificance on U when m and n are large, the reader is urged not to forget that the 
critical region of U lies in only one tail of the curve. 

6. An example. We give here a simple example illustrating the use of the 
statistic U and Theorem I. 

Suppose 60 observations were made on X and 50 observations on Y Suppose 
further that these observations are arranged in ascending order and that the 
element of this sequence is said to have the rank i. The observations on X 
occupy the following ranks; 1, 5, 6, 7, 12, 13, 14, 15, 16, 17, 19, 20, 21, 25, 26, 
27, 28, 31, 32, 38, 42, 43, 44, 45, 50, 51, 52, 53, 54, 56, 57, 68, 62, 63, 64, 65, 
68, 69, 75, 79, 80, 81, 86, 87, 89, 90, 91, 93, 94, 95. 

The observations on F occupy the remaining ranks. 

In this case, V = 34. 

For m = n = 60, 

E{V) = 51, 

<r\U) = 24.747. 

The probability of getting 34 runs or less when the distribution functions of X 
and Y are continuous and identical is therefore less than 5-10 . 

7. Consistency. We shall say that a test is “consistent” if the probability 
of rejecting the null hypothesis when it is false (i.e., the complement of the 
probability of a type II error, cf. Neyman and Pearson, [3]) approaches one 
as the sample number approaches infinity. In the literature of statistics a 
function of the observations which converges stochastically to a population 
parameter as the sample number approaches infinity, is called a “consistent” 
statistic. If a test of a hypothesis about a population parameter is made by a 
proper use of a consistent (statistic) estimate of the parameter, the test will 
be consistent also according to our definition, which thus furnishes an extension 
of the idea of consistency to the case where the alternatives to the null hypothe¬ 
sis cannot be specified by a finite number of parameters. 

It is obvious that consistency ought to be a minimal requirement of any good 
test. It is the purpose of this section to prove that, subject to some slight and 
from the practical statistical point of view, unimportant, restrictions on the 
distribution functions, the test furnished by the statistic U is consistent. 

We shall say that the distribution functions f{x) and g{x) satisfy the condi¬ 
tion A, if, for any arbitrarily small positive 5, there exist a finite number of 
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closed intervals, such that the probability of the sum I of these intervals 
IS > 1 — 5 according to at least one of the distribution functions f{x) and gix), 
and such that f(x) and g{x) have positive continuous derivatives f'(x) and 
g'{x) ml. 

In all that follows, although m and n are considered as variables, their ratio 
m/n is to be a constant, denoted by a. Let ^ > 0 denote the level of signifi¬ 
cance on which the test is to be made, so that, if f{x) s g(x), 

(23) P\U < uo(m)} = (9 

where the critical region for two samples of size m and n, respectively, is given by 

U < Unitn). 

Theorem II. If fix) and gix) satisfy condition A, and if 

(24) fix) ^ gix), 
then 

(25) LimPlf/ < Mo(m)l = 1. 

fn-^ao 

The proof of this theorem will be given in several stages. 

Let e(^ ^ and ‘,fi ^ denote the mean and variance, respectively, 

of —, when X and Y have the distribution functions fix) and gix), respectively, 

Tfl 

and the sample numbers are m and n. Let the set Xi x„ ’, yi be 

arranged in ascending order of magnitude, thus: 

(26) Z = , ^2 I • ’ ’ , ^m+n j 

where Zi < Zs < - ■ ■ < z^+n. The sequence 

(27) Y Vi , t’2 , * - , 

is defined as follows; r, = 0 if z, is a member of the set xi • ■ Xm and v, = 1 
if z, is a member of the set pi ■■ ■ y„. 

Lemma 1. If the follomng are fulfilled: 

a) fix) s 0 a: < 0, 

fix) = X 0 < a: < 1, 

fix) si a: > 1. 

b) g(x) s 0 X <0, 

gix) = 1 a; > 1. 

c) The derivative g\x) of g{x) exists, is continuous and ’positive everywhere in 
the interval 0 < a: < 1. 
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d) h is an arhiiravy iul fixed 'positive integer. For every m, ii^ <! i^m 
... < tkm are a set of k positive integers subject only to the restriction that the 

least upper bound 7 of the sequence —js less than 1. 

m + n 

Then the expected value 




satisfies the inequality 
( 28 ) 


\,~i / ,-i a+fi( (ax,„ 


< ip[m) 


inhere and ax,„ (i = 1 • • • h) is the root of 

(29) max,„ + wg(ox,J = X,n>(m + n) 
and tp{rn) depends only on m and is such that 

(30) Lim = 0. 


It is easy to verify that the root ax„„ of (29) exists and is unique. 

Proof; It will be sufficient to show that, for any specified set of values of 


' ‘ ' Dm ) ^'(r+Dm ’ ‘ ’ (^ "~ 1 • • • Aj) 


the conditional probability P\vi,„ = 1) of the relation in braces satisfies the 
inequality 


(31) 


a + g'(axrm) 


J’Km = 11 


< f'im), 


where i^(m) depends only on m and is such that 


(32) 

Por each m let 


Lim fim) = 0. 

m-*Q 


(33) 


•rrf f f 

Vtn — I’fim f ^^2 


/ t 

*^»(r-Dm ) ®'<(r + Dm 


/ 


V 




be a fixed sequence whose elements are either 0 or 1. We shall consider the 
conditional probability Piti.r™ = «! j (« = O. 1) of the relation in braces subject 
to the condition that 


(34) , (j = 1, 2, ... (r - 1), (r + 1), (r + 2), • • • k). 

Let a and 6 be two numbers such that 0 < a < 5 < 1, and let m* be a non- 
negative integer such that m* < m, and m* < [y{m + n)] where [ 7 (w -|- n)] 
denotes the largest integer < yim -f- n). Let Qm(o, h, m*) denote the proba- 
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bility that, if m* observations are made on X and n)] — m* observations 

are made on Y, the following conditions will be fulfilled: 

(a) the total number of observations < o'is exactly i,m — 1 

(b) all observations are < 6 

(c) if the [ 7 (m + n)] observations are arranged in ascending order and if 
y* =s 0 or 1 according as the j**' element is an observation on X or on Y, then 


(35) 

and 

* / 

- ^iim 

(i = 1, 2,..., 

r - 1), 

(36) 


(y = r + 1,7- + 2 

. ■ . fc). 


It is easy to see that the probability Po of the simultaneous fulfillment of the 


relations 

(34) and of = 0 is given by 

(37) 

Po - ttZ Rm(a, b, m*)m'a - 5)'"'~*(l - (j(6))"' dadb 
Jo Jo m* 

where 


(38) 

P„(a, h, m*) = X,n. "C/tYim+n))-™. (a, b, m*), 

(39) 

m' = m ~ m*, 

and 


(40) 

n' ■= n ~ [ 7 (m + n)] + m*. 


Similarly, the probability Pi of the simultaneous fulfillment of the relations 
(34) and of = 1 is given by 

(41) Pi ^ tflL R.{a, h, m*) n'g'{a)il - i-)"'(l - g(b))"'^^ da db. 

^0 Jo 7n* 

Then 


P(t>,,, = 0| _ P, 
PiK. = 1} Pi’ 


Let n<i = 2 amd m = m + n — [7(m + n)] — no. The variables 

“ OXpm), (^CTCm+nJi — ay), all converge stochastically to 


zero. 

Let Po(e) and Pi(e) denote the values of the right members of (37) and (41), 
respectively, if the integration is restricted to the region where a < h, 
1 ® 1 < €, I 6 — Oy I < « and the summation is restricted to those values 
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of m*for which ^ — 7 —^ <«. Heuce, bcctiuHo of the ului’cmeritioned 

n' (1 - (/(ay)) 

stochastic convergence, for all sufficiently laige m 

(43) i P,(e) - P. I < « s = 1, 2. 

Since P, > 0, for sufficiently laigc m, also 

(44) <e. 

Since g{x) and g'{x) arc continuous in the intciviil [0, )] niul licnee uniformly 
continuous, it is dear that 

Po(e) OC 

Pi(^) (7V.,„) ’ 

where c is a fixed constant independent of m. From (44) and (45) it follows 
easily that, for any arbitrarily small t', 

(46) ~ - -, 7 ^. 1 < 6 ' 

Pi 0 (o-kJ I 


(46) 

for sufficiently large m. 


Since P(yi,„ = 1) = ■=—, the required 1 elation (31) follows. This coni- 
Po + 7 1 

pletes the proof of Liumma 1. 

Lemma 2. If condiUons a, h, and c of Lemma 1 air tsahsfied, then 


(47) 
and 

(48) 

Proof: Since 


Limp(-;/iff) = 2 /' 

\m } Ja a + g { 

Lim /(—,/;(?') = 0 

\?7i / 


(49) 

= 1± 

we have from Lemma 1, 

(50) ml, a 


LI 1,1 

_ 4- _ 2^ (y, - Vj-i) 

in m m ,=2 


771 j-2 —. - 


Q ?I4+H 

-£ 

777* j«i2 



V d'i^lm) __ v 1 

f 3'(ajm) Y 

\m) m 

« + 1/'(0 7 

\<X "1" Q « 


_ 2 qg'jajTn) 

m‘^lio! + g'igLjm))'. 


4 - nim) 4 - *!*(t) , 
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where 

(51) 


Lim 5j(?ra) = Linny^C-y) = 0 

wi —*00 T -^1 


and a,m is the root of the equation 

(52) wa/m + nff(ajn) = j = 2 ■■■ m + n). 

From equation (52) it follows that 

(53) Lrai (a,„ — a(,_i)m)(m + ng'{a,m)) = 1 

jn->60 

uniformly in j. Since 7 may be chosen arbitrarily near to 1, the required 
result (47) follows easily from (50) 

It remains to consider the variance of —. The expression 

m 


1 + 1^1 + . 2 


m 


wf rt — 1 

+ - S yj' 

m ,=2 


2 1 

differs from - by at most —, so that its variance converges to zero with w—> <». 
a m 


In order to prove (48), it will be sufficient to show that the variance of 

w 4 -n 

(54) 


4 m-rn 

w = -T. 

m 


goes to zero with increasing m. From Lemma 1 it follows that 

(55) -z(m) < [Biv.VjVkVe) - E{ViV,)B{vkVc)] < z{m), 

where Lim ] z{m) 1=0, provided only that the integers i, j, k, I are distinct 

77J—*00 

and < y{m + n). The variance of triW is the sum of terms of the type occurring 
in (55). The number of terms for which i, j, h, I are distinct is of the order 
All other terms are of size at most 2 and their number is of the order m. Since 
the number 7 may be chosen arbitrarily near to 1 , the variance of W converges 
to zero with m —> «. 

This proves Lemma 2 

Lemma 3. If conditions a, b, and c of Lemma 1 are fulfilled, and if (24) holds, 
then 

= f 

•'0 Oi 


(56) 


T 


+ g'{x) 


dx < 


1 


1 + a 


Let Oi < as be any two real numbers and designate — ^ by . Let 
F{x) be defined as follows: 

F(oi) = 0 , 

F(x) = (» - o,)b. -t- F(o<), 


( 57 ) 


{a, < X < a<+i ; i =s 1 , 2 ). 
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Let c be defined by 


(58) = c(a3 — ffli). 

Then it is easy to verify that the maximum of 


(59) 


r* — 



F'(x) 
a + F'{x) 


with respect to bi and 62 , subject to the restrictions that bi and 62 be non¬ 
negative, and that, ai, ag and c be fixed (c > 0 ), occurs when and only when 


(GO) 


hi = = c. 


Now define 


(61) 


and 


F., = Po, = 0, 

7 _ (l(Pi) - !7(-P<.-i);) 

h} —---—- 


5i = 


1 y h, 

2 ’ 1-1 a -j~ l„* 


(*= 1 , 2 , ... 2 ';i = 0 , 1 , 2 ...). 


Repeated application of the result of the piovious paragraph easily gives 


(62) 

S, > 5j+i. 

From (24) it follow^s that there exists a positive, integer j' such that S/ > (.Sf 

Obviously 


(63) 

So = pi- 

i + a 

and 


(64) 

Lim Sj = r. 

J'-+bo 

Hence, Lemma 3 is proved 



Proof of Theorem IT l^et 5i > 52 > • . > 5, > •. • be an arbitrary but fixed 
sequence such that lim 5, = 0. Foi 5 = 5;, lot h , ■ ■ , hu) be a set of closed 
intervals such that no two intervals have an interior point in common and 
within which, by condition (A),/'(r) and g'{x) exist, are positive, and con¬ 
tinuous Let la, be the compleinentai y set (with respect to the whole line). 
(It is easy to see that, if condition (A) is fulfilled, such a system can be con- 
.stiucted.) Let I'fi = 1, 2 k(j) and f^oj denote, respectively, the runs 
caused by tlu' obsci vations which fall in the intervals I, , Ja, ■ Then 

f/ - £ f/, - Ih, 


(65) 


< 2(fc(j)). 
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From ('onditioii (A) it follows that, with a probability arbitrarily close to 1, for 
sufficiently large m, 

(66) C/oj < ipmS], 


where 


p = max 





Lot [a, < a' < ?j.], t = 1, 2 ■ ■ denote the interval 7,, and let m, and Ui denote 
lli(' nuinlKM' of obsorvatioiia on X and Y, respectively, which fall in the interval 

7,, Then —^ and — converge stochastically with increasing m to [/(i),) - /(a,)] 

7TL Hi 

and [g{h,) — !/(a>)]i respectively 

Within the inteival 7,(? = 1,2- • fc) we perform the transformation 
(67) W* - fiX), 7* = /(F), 

whicli leaves U, invariant. For fixed m,, n, the relative distribution of X* 
is uniform and the relat.ive distribution of F'* fulfills condition (c) of Lemma 1 , 

Hence from Lemma 2 we ohf.ain that — converges stochastically to 


( 68 ) 


Lim E 



. 2[/(h,) -- f{a,)]lgih) - g(a,)] 
“ ioibr)- g(aj] 


It can be verified that the sum of the second members in (68) over all values i 

2 

is le.ss than or equal to ;—. 

1 + a 

From (24) and condition (A) we get that, for sufficiently small 5, , there exists 
at least one interval for whicIi the first mi-mhoi of (68) is less than the second 
member. Hence 


(69) 

where 


S < 


2 

1 + «’ 


(70) 


= 23 Lim E 

m— 



Now take j so large that 


(71) 3p8, < e, 

where 


0 <3e< - S. 

I + <x 


( 72 ) 
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Since — converges sloclmsUcallv to its expected value, from (65), (66), (70), 

(71), and (72), it follows that, with a piobalnlity arbitrarily close to 1, for suffi¬ 
ciently laige m, 


(73) 


2 - 

t)\ 1 -|- a 


From (23) and Theorem I we gel 


(74) 


Lim 

7/1-+oa 


•ito(m) 

m 


2 

1 -f- a 


Theorem II follows easily from (73) and (74), 


8 . Remarks on a proposed test. We have already remarked in Section 3 that 
the test proposed by W R. Thomp,son is not consistent To show tins, we shall 
give two distribution functions f(x) and g{x) such that, although these functions 
will bo very different, the probability of iejecting the hypothesis that they are 
the same will not approach one as the sample number approaches infinity. 

Suppose, to simplify the notation, that the observations have been ordered 
according to size, i e , that x^ < < ■ < Xm and yi < 1/2 < < Vn . Sup¬ 

pose further than m = n, and that the test is to be made on a level of significance 
/3 > 0, In the right member of (2) we need not exhibit n and shall replace 
k and k' by k{m) and k'(ni) to show the dependence on m We have, under the 
null hypothesis, 

(75) P{xKim) < Vk'm] = 4>im, kirn), ¥{m)) - 0. 


The sequenoe - is bounded, so that there exists a monotonically increasing 
m 

subsequence nii, m 2 ■ ■ of the sequence of integers 1, 2 • • and a number h, 
0 < ft < 1, .such tluit 


(76) 


Lim 


k(nt,) 

m, 


= ft. 


It IS easy to sec that then also 


(77) 


Lim 

*00 


k'jmi) 

mi 


= ft. 


We .shall now assume that 0 < ft < 1. If ft = 0 or 1 only a trivial alteration 
will be needed in the argument to follow. Let e and 6 be arbitrarily small posi¬ 
tive numbers. We now consider two populations, A and B described as follows: 

A) Six) = gix) s X (0 < X < 1), 

B) fix) = I (0 < a: < 1), 


gix) s gia,) -f 


(a- — CT.)(g(a.+i) — g(q.)) 
(fli+l Ui) 


(a, < * < a,+i;f = 0,1, • • •, 4), 

9 
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where 


Ofl = 0 

It 

0 

ai = ■“ 25 > 0 

g{ai) = 0 

O 2 =: — 5 

£1(02) = ai 

tt3 = /i-b5'<l~-5 

g{ai) — Os 

04 = 1 - 5 

Jl 

06 = 1 

g{ai) = 1 


The definition of fix) and g(x) outside the interval 0 < 2 ; < 1 is obvious. It 
will be shown that even for such different populations as A and B and for 
samples of size greater than that of any arbitrarily assigned number, the prob¬ 
ability of rejecting the null hypothesis if B is true will be at most 4- e. 

Let h, h, hi denote the number of observations on X which fall in the 
intervals 0 < x < Oa, nj < .'C < fts, fts < a: < 1, respectively (m fixed, of course) 
Let h[, h'i , h'i be the corresponding numbers for Y. For a fixed m, the prob¬ 
ability of a set h).,h 2 ,hi, h[ ,hi,hi is the same whether the sample be drawn 
from the population A or B, From (76), (77), and multinomial law it follows 
that for all sufficiently large m, the probability is at least 1 - e of the occurrence 
of a set hi, hi, hi, h[, hi, h'i for which and yk'[m,) will both fall in the in¬ 
terval flj < X < tta. Furthermore it is obvious that for all samples with fixed 
h,, hi the distribution within the interval 02 < x < 03 is the same whether the 
sample came from the population A or B Hence even when the sample is 
drawn from the population B, the first member of (76) is < (3 This com¬ 
pletes the proof of the inconsistency of the test based on (75). 

This test is consistent if the alternatives to the null hypothesis are limited, 
for example, to those where g{x) = /(x + c), c a constant. 
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THE SUBSTITUTIVE MEAN AND CERTAIN SUBCLASSES OF THIS 

GENERAL MEAN 

By Edward L. Dodd 

1. Introduction. No general agreement has been reached, so far as I know, 
as to what constitutes a mean. A necessary condition which appears to meet 
with general approval is that a single-valued mean of a set of numbers all equal 
to a constant c should itself be equal to c. However, there appears to be some 
valid objection against imposing any other proposed condition as necessary. 

Of course, intermediacy is a condition that suggests itself at once. Indeed, 
in certain mean value theorems m general analysis—such as the First Theorem 
of the Mean for integral calculus, which I mention in Section 3—intermediacy 
is the main feature. 

However, 0. Chisini [1] insisted that intermediacy or internality is not the 
chief characteristic of a statistical mean. Rather, a mean is a number to take 
the place, by substitution, of each of a set of numbers in general different. 
Such a mean may well be called a representative or substitutive mean. 

Chisini defined m to be‘a mean of ii, aij, ■ • , Xn, relative to a function F, 
provided that 

(1.1) F(m, m, ■ , m) = F{xi 
If, for example, 

(1.2) F{xi , a:2, • • • , aJn) = = Sw^ = nm^, 

the mean m thus obtained is the root-mean-square 

(1.3) m = ± [(l/7i)Sa:?f'l 

The choice of F, Chisini noted, depended upon the use to be made of the 
mean 

Suppose now that f{xi, Xi, Xn) is such a function that one value of 

(1.4) f(x, x,---,x^ = x. 

And suppose that this/is taken as a particular F for (1.1) to determine a mean 
m implicitly, thus 

(1.6) /( to , to , ■ ,m) = f(xi,Xi, ■ ■■ , Xn). 

Then, from (1.5) and (1.4) it follows that one value of 
(1.6) f{xi ,Xi, ■ ,x„) = m. 

And thus / determines the mean to both explicitly and implicitly. 

m 
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It should be noted that the F = Sa;* in (1.2) is not itself a mean of the a;,. 

If, in (1 2), we take xi = — 2, aia = 1, xa = 1, then the double-valued moan 
m = ± 2 ^^^ results. Now — 2 ^^^ is interrml] e.i. — 2 < — 2 ^^^* < 1 , but 2 ^^* is 
external, for 2'’’^ > 1 > — 2. But smee here Sx, = 0, it follows also that the 
standard deviation of — 2 , 1 , 1 , is the external mean 2^^^ Chisini [ 1 ], indeed, 
used the root mean square to show the possibility of external means. External 
means have been noted by other writers, [2-7]. 

It is noteworthy that a number of writers [8-12] have used the condition 
(14) (in general, with / single-valued) as one of a set of axioms to 
characterize particular means. Sometimes, this has appeared in weaker form 
as /(I, 1 , .. ■ , 1 ) = 1 . 

This paper will be concerned primarily with the mean of a finite number n, 
of variates, Xi, X2, ■ ■ ■, x„ . Possible generalizations will be mentioned briefly 
in Section 8 . 

In the conception of the substitutive mean, m, as I have been using it for some 
time, emphasis is laid upon the explicit form for m; and provision is made for 
multiple values. 

Definition of the Substitutive Mean. Let f{xi , xj , • ■, Xn) be a Junc¬ 
tion of n varvdbles, xi, xj, ■ • • ,Xn defined at least for one set of equal values, x, — k. 
If c is any number such that fifi, c, ■ • ■ , c) is defined, let one value of 

(1.7) /(c, c, .. , c) = c. 

Then f(xi , Xj, • , Xn) will be said to be a substitutive mean of xi, , x„. 

If an original formulation of a problem does not a 8 .sign to a function a value 
when the variables are all equal, it is sometimes possible to assign such values 
by continuity considerations, such as are commonly used in the “evaluation" 
of indeterminate forms. This will be discussed in Section 6 

In the following, when the word mean is used, it will designate the substitu¬ 
tive mean as defined above 

2 . Classification of Means already made. Some general classes of means 
have already been distinguished. One important basis for a classification of 
means is the kind of data to be used. The data may be only qualitatively 
distinguishable. Then numbers may be assigned to qualities For dealing in 
a very general way with all kinds of data, C. Gini and L. Galvani [13], and 
G, Pietra [14], distinguished between data in rectilineal series, in cyclical senes, 
and in unconnected series. These three classes are associated respectively with 
the straight line, the circle, and a regular polyhedron (in three dimensions, the 
regular tetrahedron, and in n dimensions, a polyhedron with n 1 vertices each 
at the same distance from each of the other n vertices). 

For one definition of the arithmetic mean of a cyclical series, Gini uses tlie 
center of gravity principle, and this mean is computed with the aid of sines and 
cosines. By mechanical means, such an arithmetic mean of dates—for example, 
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of dates of weddings—as days of a year can be found. On the riirt of a wheel 
delicately suspended and marked off for the 365 days or 366 days of a year, let 
small weights proportional to the number of weddings on a day be placed in the 
spaces assigned to the individual days Then when the wheel comes to rest, 
the arithmetic mean of the dates will be found at the lowest point of the rim. 
In the special case where the center of gravity of the system is at the center of 
the circle, the mean is indeterminate, or we may say that every day is a mean 
day 

Also, for cyclical series the arithmetic mean and the median are defined by 
other methods, using such principles as minimizing the sum of the squares of 
deviations or the sum of the absolute deviations. 

The properties of means may be made the basis of a classification, either those 
properties which have been evolved by writers [8-12], [15-18] who have char¬ 
acterized specific means by sets of axioms, or those properties which seem of 
special importance in making distinctions. Two such properties will now be 
mentioned. 

Gini [19] recognizes two large classes of means: “A) medie ferme, B) medie 
lasche,” the latter (loose) class including the median and mode for which values 
do not depend upon all the data. To describe this latter mean m of arguments 
a:,, we might write am/ dx^ = 0 as applying to several if not most of the argu¬ 
ments over wide ranges instead of at isolated points. 

Subclasses of A or firm means as given by Gini will be discussed in Section 4. 

Another rather large classification distinguishes between simple means and 
their weighted forms. In a case often encountered, where the weights are 
whole numbers indicating frequencies of occurrence this distinction is of little 
significance. In the more general case, however, where weights may give ratings 
of the efficiency of measuring instruments or the weights may be negative [ 6 , 
20 ], more direct attention needs to be paid the weighted forms. 

To supplement classifications already proposed, I am indicating in the next 
section a descent from the substitutive mean, the most general of all means, 
down through two classes of means less general, which I am calling the summa¬ 
tional mean and the quasi-arithmetic mean, to the more specific mean known 
as the associative mean, studied in particular by M. Nagumo, [21} A Kolmogoroff, 
[22] and B. de Finetti, [2]. 

The foregoing subclasses of the general or substitutive mean are based 
primarily on structure, the way the mean is formed. 

3. The Summational Mean, Quasi-Arithmetic Mean, and Associative Mean. 

The summational mean, now to be defined, is a generalization of the weighted 
arithmetic mean. 


_ CiXi -f- CtX2 -h ' • • •}* CnXn 
Cl + Ci + • • ■ + Cn 


(3.1) 


Sc, 7^ 0. 
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It is to be noted that although W is not a symirietric function of Xi , IT ia a 
symmetric function of c.a:,. In the generalization Q, the following features of 
W are retained: 

1. Certain weights c, being given, Q is a symmetric function of c,*,. 

2 . This Q may be determined from sums of n terms, each term involving 
one and only one . 

Definition. Let 2 denote a summation for i = 1, 2, ■ • • , n. Suppose that 

(3 2) F\y, 2/i(c,a:,', y), J.U{c,Xi ,y), , 2/*(c.a:. ,y)} =0 

has a solution, y = Q which is a substitutive mean of Xi, xt x^ . Then Q 
will be called a summational mean of xi, x^i, x^, relative to the functions fi, 
fi, ■ fk, and P. 

Sometimes it is possible to express Q as 

(3 3) Q = G{I,giic,Xi), 2ff2(c,a:,), ■ • • , 

Among summational means, those of most frequent use involve in a special 
way but one summation. Thus with ^{x) a function, which would usually be 
taken as continuous, this m satisfies 

(3.4) ^(m)2c, = ICiiplx,). 

But this, with c, > 0, is just an algebraic analogue or prologue to the First 
Theorem of the Mean for integral calculus—the Cv to be replaced by a positive 
integrable function Without further specification, this mean m may have an 
uncountably infinite number of values. But if it be required that \{^ix) be a 
continuous increasing function, and that c, > 0, then m is unique. 

In a series of papers, C. E. Bonferroni [20], [23-27] used means such as m in 
(3.4) for statistical and actuarial problems. And, as he had in mind [28] dis¬ 
tinctly the notion of substitution, ■ he was in a sense a forerunner of Chisini. 
E. L. Dodd [29] made use of a mean m defined with the aid of n continuous in¬ 
creasing functions ^,(a:), thus; 

(3-5) 2c.V'.(m) = 2c,(a:,-), c, > 0. 

If gi{x) = this can be written 

(3'6) 2g.(?n) = 2 g,(a:,). 

In one paper, C. E. Bonferroni [20], as already noted, used weights which 
might be either positive or negative. 

Some such mean as m in (3.4) has been used by a number of writers. Here 
i/'(wi) is a weighted arithmetic mean of and thus it is natural to call m a 
quasi-arithmetic mean of x,. 

Definition. Let 2c< ^ 0. If m is a solution of 

(3-4) lA(»ra)2c, = 2c,V'(i.), 
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then m will be called a quasi-arithmetic mean of x,, with weights Ci, and relative to 
the function \p(x). 

Sufficient conditions for the existence of this mean m are; ( 1 ) That 4f{x) be 
continuous in the interval I, finite or infinite, in which the observations a:, lie; 
(2) That either c, > 0 for each i, or that \l/(x) take on all real values, as x runs 
through 1. 

It will be helpful to picture geometrically the double transformation or mirror¬ 
ing represented by (3.4). Points Xt on the horizontal axis are carried vertically 
to the curve y = \(i{x) and then reflected horizontally to the y axis. For the 
points 2 /,, on the y axis thus obtained the arithmetic mean y or “center of 
gravity” is obtained. Then y is carried horizontally to the curve and reflected 
vertically to the a:-axis. The abscissas m of points on the x-axis thus obtained 
are means of the given a;,, relative to this ^(a;). 

It may happen (Dodd [3 p. 746]) that the curve y ~ i/'(x) contains horizontal 
segments, as in the curve for temperature y of ice-water-steam which has ab¬ 
sorbed a quantity x of heat. In this case the mean m may be an “interval,” 
an uncountable set of real numbers. Indeterminateness over an interval is a 
well known feature of the median of an even number of variates. In fact, a paper 
of D. Jackson [30] was for the purpose of indicating one method of selecting a 
single value from this interval of indetermmateness, as a median. 

It may be noted that a mean of n variables becomes, when n = 1, a function 
of a single variable; and thus it appears possible to implant in a mean of n 
variables almost any peculiarity found in a function of one variable. 

A special case of the quasi-arithmetic mean is the associative mean m which 
under some general conditions has been shown [ 2 , 21 , 22 ] to satisfy 

(3.7) n^(m) = S\f'(x,), f ~ 1, 2 , • • ■ , n; 

where ^(x) is a continuous increasing function. 

If /n(®i, X 2 , • • ■ , x„) is an associative mean, then by definition, /„(xi , 
2^2 , • • • , a;n) is unaltered when any k of the n variates are each replaced by the 
mean fk of that set. 

4. The Gini means as summational. Having distinguished firm means from 
loose means, Gini [19] noted that in the former class, a variate might appear as 
a base, as an exponent, or both as base and exponent. In general, these variates 
are to be positive. Gini then listed ten means of a decidedly broad character, 
some of them generalizing the combinatorial means treated by A. Durand [31] 
and 0. Dunkel [32]. See also G. Pietra [37]. 

These ten means involve only the four simple arithmetic operations and root 
extraction. For many purposes they are best expressed in the form given by 
the author. However) to show that these means are summational, logarithms 
will be used to reduce products to suras. 
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Let 

= Sa;.” i = 1,2, , n-, 

nCo = n]fc\{n — c)!, a binomial coefficient; 

Pc be any one of the „Cc products of c different elements taken from 

( 4 . 1 ) Xi, Xi, ■ ■. , Xn] 

Pc = (PcY, the j 9 *'' power of Pc ; 

Zc = SPo, the sum of all the nCc products Pc ; 

Zl = SPf. 

In the expressions which follow, it is assumed that the denominators are not 
zero. 

The ten means, as defined in Gini’s Equations I, II, • , X, will be designated 
here by ffii, mj, • • • , mw ; and their logarithms, with base arbitrary, will now 
be given. 

log = (log yS” — log n)/p 
log wi 2 = (log Zc - log nCc)/c 
log ma = (log Zl - log 
log mk - (log S” - log /S’)/(p - q) 
log ms = Xxf log x./S" 

( 4 . 2 ) 

log m = (log Zc — log Zd — log nCa + log „(7d)/(c — d) 
log m? = (log Zf — log Zd — log nCc 4 - log „Cd)/(c — d)p 
log mg = (log Zf - log Zl)/c{p - q) 
log rth = SP,' log Pc/cZl 

log mio = (log Zc — log Zd — log nCc + log nC'i)/(cp — dq). 

As noted by the author, the foregoing include some well known special means. 
Thus, mi is the power mean, which for p = 1 , 2, — 1, becomes respectively the 
arithmetic mean, the root mean square, and the harmonic mean. If p 0, 
then the limit of m 3 and of m? is the geometric mean. If p = 0, 1, 2, and q = 
p — 1, then mi is respectively the harmonic, the arithmetic, and the contra- 
harmonic mean. 

For each of the ten means, Gini gives an appropriate name. Those involving 
binomial coefficients are combinatorial, a mean like the contra-harmonic with 
denominator other than a constant is biplanar, the more simple means 
monoplanar. 

When in the following, I show that certain combinatorial expressions may be 
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replaced by sums, it is not implied that this replacement would simplify 
computation. 

To prove that mi, rm , ■ ■ • , mjo are all summational means, it may be noted 
that n, p, q, c, d, „Cc, and nCd are constants. Moreover, is the symmetric 
sura of the pth powers of x ,, thus with only one a;, in each term, and 
i = 1,2, ■ ■, n. And, since Zc, Zc , , and Zj are symmetric polynomials m 

the X,, they may be expressed as polynomials in S^, ■■ , by a well known 
theorem of algebra. Hence among the ten means, the only one that requires 
special attention is the ninth mean, m». 

To show that mg is a summational mean, we need only examine the numerator 
of the right member. Let this numerator be N. 

(4.3) AT = SP* log P.. 

Then 

(4.4) qN = (xixl . . a;®) (log a;? + • • • + log + ■ . . 

Thus, if we set 2 /, = a;? , we may write 

(4.5) qN = (i/i2/3 • • • 2 /<=)(log yi + ■ + log y,) + • •. . 

The coefficient of log yi in this right member is the sum of all products of c 
different factors which include yi. 

Now, let Yt be the sum of the products of r different factors taken from 
Vh Vi) ' • ) Vn and let T, be the sum of the products of r different factors 
taken from yt,yi, • ■ • , yn . Then it is evident that 

(4.6) Yr = Pr + yiPr-l | Tr = Yr — yi?V-l . 

If, now, we set Fo = 1, it follows that 

(4.7) Tc-i = Fc_i - yiF._3 + ylY,.^ - .. + (-l)'-Vr‘F„. 

Hence, in qN, the coefficient of log yi is 

(4.8) = yiF._: - ylY,-, + • • • + (-l)“~yiFo. 

Thus in qN, the terms containing log yi are 

(4.9) Yc-m log 7/1 - Yc-iyl log yi + • • + (- l)y°i log yi. 

Now let 

(4.10) Ur = Zyllogy,, i = 1,2, ,n. 

Then, 

(4.11) qN = F„_iC/i - YM + • • • + (-iy~^Y,U ,. 

Thus, qN IS here constructed from sums of n terms with but a single yi in any 
term. 

Likewise, with y, replaced by a?, a term contains but a single x, , 
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6. Transformations. A function f(xi, X 2 , ■ ■ ■ , Xn) is not in general a mean 
of its arguments Xi. However, it is often possible to make a substitution 
= <^(2/0 90 that 

(5.1) fl‘l>{yi), 4‘{ys), ■■■ , <l>{yn)] = giyi , 2 / 2 , • ■ ■, 2/n), 

is a mean of its arguments j/,. 

The required substitution is sometimes obvious, as in the case of the estimate 
s of scale 

(5.2) s = [(l/n)S(x.- - m)T = {{I 

Here s is a mean of 1 /,, although it is not a mean of a:,. 

Definition. Let y ~ ^{x), in general multiple valued, he defined in an in¬ 
terval I, finite or infinite, the values of y lying in an interval J. Suppose that for 
each y m J, there is at least one x in I such that \p{x) == y Let any such x he 
designated by ^{y). Then d>{y) wiU be called the inverse of fiix). It follows that 
one value of 

(5.3) i'[<l>(y)] = y- 
Theorem. Let 

(5.4) z ~ f(xi,X 2 , -- , xf), 

in general multiple valued, he defined when each is in some interval 1, finite or 
infinite. With x in I, set 

(5.5) 'l'{3:)^fix,x,---,x); 

and suppose that y = ^{x) has an inverse, x = d){y) defined in J. Lei x, = 
<i>{y,) be 'substituted into f to form the function 

(5.6) w = f[<t>{y{), 4>{y2), • •. , (^(j/„)] = g(yi ,y 2 , ■ ■ •, y«). 

Then w is a mean of yt , defined when y, is in J. It is thus a mean of ^(x^, 
where x, is in I. 

If further, ^( 3 :) is a continuous increasing function of x, then for a given set of 
x,, the values of z and w are identical. The same is true for a given set of n values j/,. 
Proof. If each y, = c,d. number in J, then 

(5.7) flfiivi), ■■■ , <l)iVn)] = Mic), . • • , 0(c)] = 0[0(C)]. 

And one value of ^[^(c)] is c, from the definition of the inverse function 4>{y)- 
Moreover, if a number c' is taken in I, then 0(c') is some number in J, which 
we may call c; and the argument above is applicable. Finally, if \p(x) is con¬ 
tinuous and increasing, then a number x, in 1 is associated with one and only 
one yi in J ; and vice versa. Thus w and z become identical. 

In the foregoing, we started with / which is not a mean of its arguments Xi, 
and obtained g which is a mean of y ,. Something like the reverse of this is 
possible. The last member of (5.2) is a mean of yt . It was obtained by treat- 
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ing W. £Ls a constant, With respect to x,. If, however, m is an estimate for 
location and is taken as (l/n)Sa;,, and this is substituted into (5.2) then 

(5.8) s = {[(n - \)/n]Zx\ - {2/n)'Lx,x,]^‘\ i < j. 

This s is now not a mean of x ,; for if a; equal any constant c, then s = 0. 
Furthermore, there exists no single valued continuous increasing function x = 
such that if Ki = <^(yv) is substituted into (6.8), s will be a mean of the 
{/,. Thus the elimination of m from (5.2) interferes with the status of s as a 
mean of the Xi . 


6. Indeterminate Forms that arise in testing for Means. Sometimes a func¬ 
tion / is substantially continuous. But the investigation leading to the func¬ 
tion fails to assign to the function a value for certain values of the argument x, 
or arguments, xi, xj, ■ • • , a;„ . However, values are often assignable which 
will make the function continuous. This is the usual occurrence when, in curve 
fitting, parameters are estimated. In general, the measurements are assumed 
to be not all alike. However, when a general function such as Sx,/n for loca¬ 
tion is obtained, we do not hesitate to assign to this function the value c when 
each Xf = c, to make the function continuous. 

As another illustration of “indeterminate forms,” consider the Jackson [30] 
median, M, of four numbers xi g xj < xa g xi, viz., 

(6.1) M = (X 4 X 3 — X2Xi)/(x4 + X 3 — Xs — xi). 


A direct substitution of x = c, renders M indeterminate. But if x, —> c, 
indeed, if merely xj —> c, and Xj —>■ c, so also does M. 

In a recent paper, R. Cisbani [33] generalizes means suggested by Dunkel 
[32] and L. Galvani [34] by setting up 

(6.2) y,{x) = I n ' 2 (ffl^ + ft)-*''! , j 7 ^ 0 , x 0 ; 


and letting n fxi. There results an integral with the value 


(6.3) 



(x/j + 1)(6'- a')J ’ 


for the case, x j. This mean set up as a mean of an infinite number of variates 
turns out to be also a mean of the two numbers a and h ,—which for b = o be¬ 
comes indeterminate. But as h approaches a, so also does y,(x) approach a. 
This is also true for the special cases x = — j, etc. 

In testing to see if a function m of x, is a mean of these numbers, a difficulty 
sometimes arises, because a substitution of x, = c and m = c into the equation 
which implicitly defines m will put zeros into denominators. An aid in such 
testing will now be formulated as a theorem, although the ideas involved are 
not essentially new. 
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Theorem. Let f{x) he a continuous increasing function of x defined for each 
real x. Let 

(6.4) m ^ 0- 

Given n real distinct nunibers 


(6.5) Xi < Xi < ■■■ < a:„_i < , 

n podiive numbers, k,, and a real number C. 

Set 


(6 6 ) 


F{x) = 


h 


+ ... + 


kn 


fixi - X) ' ' f{Xn — X) 

Then F{x) == 0 has n ~ 1 real roots m,, suck that 


- C. 


(6.7) Xi < Ml < Xi < mi < • ■ ■ < m„-i < x„ ; 
also, a root less than xi , provided 

(6.8) ltK/f{+«>) < C], 

or a root greater than Xn, provided 

(6.9) S/c.//(-oo) > C. 

Proof. Since f(x) is a continuous increasing function of x, so also is 
h%/f{Xi — t), except for the single value, x = Xi. So also, then, is Fix), except , 
when a; = or or • • or x„. But 


(6.10) F{x, + 0) = - «>]F{x,^i ~ 0) = + ^. 

Hence, between Xi and a:, 4 ,i, there exists a root m,, of F{x) = 0. 

Moreover, since 

(611) Fi-^) = [lk,/f{+^)] - C-,Fixi - 0) = 

it follows that there is a root les.s than x\, provided (0.8) i,s sati.sfied. Likewise, 
there is a root greater than if (6 9) is satisfied. 

The use of this theorem in testing for means is simple. Keeping the x, dis¬ 
tinct, the equation Fix) ~ 0 determines (n — 1) numbers, , such that if 
X, c, so also do these m, —> c. Employing continuity to define m, when each 
X, = c, we may say that each m, is a mean of a;,; j = 1, 2, • • (ri — 1); f = 
1,2, ■ n, when the conditions of this theorem are satisfied. If Fix) — 0 has 

still another root, m, this m will not in general be a mean of a,. 


7. Summational Means arising in the Estimation of Parameters of Frequency 
Distributions. In curve fitting, the estimation of parameters leads in general 
to summational means. If the method of moments is used, the first step-is to 
find the moments by summation. I have already considered estimates for 
location and scale by this method [7], and by the R. A. Fisher method of maxi- 
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mum likelihood [4]. A further study of the results of the likelihood method will 
now be made. 

By this method, products which first appear are reduced to sums by log¬ 
arithms, and the means found are, in general, summational. Some idea of the 
forms of these means can be obtained by examining a rather general form of 
frequency function which includes the Pearson Type I, and involves parameters 
with estimates p >' 0 and g > 0, in addition to the location m and scale a. 
Let the observations hexi, x^, ■ ■ , ] let 


(7.1) 

(7.2) 


= (a:,' — ffi)/a; 0 g i g 1; a > 0; 


1 r(p + g) 
^ a r(p)r(g) 




The likelihood L is obtained by multiplying together the n factors obtained 
by substituting t = h , , • ■ ■ , tn . 

Then 

log L = — n log a 4- n log r(p + q) — n log-r(p) - n log r(Q^) 

+ (P - 1) 2 log i!, -I- (g - 1) E log (1 - U). 
1 1 

From dLldm = 0, there is obtained 


(7 4) PS 




n> 




X, — m — a 


= 0; P = p-1, Q = g-1. 


Suppose P 0 and Q ^ 0; and as a first case, suppose P -i- Q 0. If each 
Xi is replaced by x, the above equation leads to w = a: — {Pa)/{P -f Q)- 
Then m is a summational mean of 

(7.5) x'i = x^ - {Pa)/{P -1- Q) i = 1, 2, •. ■ , w; 

as seen by applying the Theorem in Section 5. 

Likewise, a is a summational mean of 

(7.6) x': = (a;< - m)(P + Q)/P. 


If P 0, Q 5^ 0; but P 4- <3 = 0» then (7.4) becomes 

(7.7) 2- - - = 2 . 

x, — m — a Xi — m 


Now set — m,C = 21/p, ; and write (7.7) as 


(7.8) F(a} = 2 —^ - C = 0. 

Vi- a 

This has the form given in (6.6) with x replaced by a, = 1, /(a) = a. If then 
Vi < Vi < ■ • < Pn , there exist (n — 1) solutions a, of F{a) =0 between pi 
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and y,. And thus keeping the y, distinct, if yt c, so also do the a -> e 
These a, are then means of yi, and thus, means ol x, ~ m. ’ 

In the more general case where P + Q 0, it is seen klso that Q is a summa¬ 
tional mean of 


(7.9) 


- ii- 

La:, — m J 


From dL/da = 0, quite analogous results are obtained The special case 
now however, is given byP-i-Q-f.i = o = p-f 5 -i. And, with the 
continuity interpretation, a is a mean of x, — m; and moreover, m is a mean of 

Xi ~~~ d. 

Using now the digamma function 


(7.10) 

set 


fin) = ~\ogT(u), 


D(p) = f(p + q) ~ f(p). 

The condition aL/dp = 0, then leads to 

^^■12) D(p) = (l/n)2(-]og g, 


0 < <. g 1. 


Now, with 5 > 0, .D(<») 0, D{ 1 -h 0) — w; and D(p) is a continuous de¬ 

creasing function of p, when p > - 1. Then, since - log U > 0, there is a 
unique p > - i to satisfy (6.12), S u, rneie is a 

. ^ ^he p thus found is 

a mean of D (-log t,), where P~' is inverse to D. 


(7.13) 


y = 


e P, t = (x - m)/a, p > _i. 


a r(p + 1 ) 

~ found that m is the arithmetic mean of a:, — • 

(. - - cit, the it:: tr zrztii ■: 

Z “All “ ““““‘■“•l ">«™ o< - m a ,8 the hurmonu mem of 

0, 'thettbutd ' ~ - 


(7.14) 


(l/n)Sa;, = m -|- a(p i)^ 


cotre; litirjt! xtr 
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8. Generalizations. The extension of results from the discrete or discontinu¬ 
ous case where a mean m depends upon only a finite number of elements to the 
continuous case is fairly immediate, with integration taking the place of summa¬ 
tion, and a distribution or frequency function taking the place of discrete weights, 
c,. Stieltjes and Lebesque integrals may be used as well as Riemannian Such 
a generalization of the Chisini mean was given by de Finetti [2]. 

The summational mean, which I have defined as involving possibly several 
summations, may bo generalized likewise. 

In terms of set functions, sometimes called functionelles, I gave [35] the fol¬ 
lowing general definition of a mean with a point set H in mind as a distribution 
function 

Definition Let E and H he sets of numbers. Such a number I may be a real 
number or a vector number t = (h, t 2 , • • , 4). 

Let El be the result of replacing each number of E by a single number t. 

Then the mean m of numbers in E, relative to the set H, and to a function f, is 
given by m = f(E, H)] provided that the function f has been so constructed that 
for each t in E, f{Et , H) = t, or at least one value of this f is t. It is to be under¬ 
stood above that when E is changed to Et , the set H remains unaltered. 

This retains the chief feature of /(<, t, • , t) = < in explicit form or of f{t, 

1) • 'll)— fill I h I • • ) 4) in implicit form, where t is a mean of 4 , 4 , • ,t„ . 

I used [36] a somewhat less general definition to discuss regression coefficients. 
All such means may well bo called substitutive oi representative. 
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THE PRODUCT SEMI-INVARIANTS OF THE MEAN AND A 
CENTRAL MOMENT IN SAMPLES 

By Cecil C. Cbaiq 


The method developed by the author for caWulating the semi-invariants and 
product semi-invariants of moments in samples from any infinite population^ 
is not immediately applicable to the calculation of product semi-ihvanants of 
the mean and a central moment in such samples. In the present paper this 
method is adapted for this purpose so that the calculation of these product 
semi-invariants becomes routine As it will be seen, the computing is a little 
heavier than in the case of central moments alone for results of equal weight. 
A table of results up to weight ten for the mean and the second, third and fourth 
central moments is given. The author plans to apply these to a further study 
of the sampling characteristics of the coefficient of variation and Fisher’s t in 
samples from non-normal populations. 

Let a random sample, xi, , ,XiiofN observations be drawn at random 

from an infinite population characterized by the semi-invariants, Xi, X 2 , X 3 , • • , 
The sample mean is, 

S = 11 x,/N, 

and the ?i-th central moment of the sample is 

mn= H {x, - xf IN. 

.-1 

Then the product semi-invariants of order M of x and m„ , Skiix, ml), are defined 
by the formal identity in the parameters 1 ? and w: 

((Sioi? -j- Soi(o) -b 4- /Soico)® 

( 1 ) 

+ + -Soioi)® -b • •. = log 

in which E denotes the mathematical expectation over the set of all such 
samples and 

(Sio^ + = t Qs,'r-,(x, 


‘ “An Application of Thiele's Semi-invariants to the Sampling Problem;" Metron, Vol. 
VII, part IV (1928), pp. 3-75. 
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If wc cleiioLe E{x''tnu) by Mu , wo have by definition the fuither formal identity 
in d and u. 

£’(6*'+"’"“) s I + (MmS + Mo,co) + 1 (ikfioi? + + • • 

in which (Mm& + Moio^Y'^ is to be expanded in the same manner as 
(/Sioi? “H iSoiw) above. 

Let us write 


and then 

( 2 ) 


•“ iCj "" Xf 


£l(e*'’+’"»“) = E(e 


iSXi)i)/N+<.'S6'l)u/N 


). 


(Summations with respect to i and j always run from 1 to N.) Now we define 
a new set of product semi-invariants, Xr«j . , of the sum Sa;, and the N 5/s, by 
means of 


(Xioi» 4- 2Xoi«.) -I- ^,(Xiod -b SXovOi.)^'’ + ... s 
in which for example, 

( Xiod -(- ^ XoiWi I = X2ooo<^ 2Xiiooi^wi 

+ 2 X10101^102 + • • • + X02OOWI + X0020W2 + ^0002CD3. 

We may set 

/• 

1 


5i = a,jXj with 

j-i 




(l|^ 


N - 1 
N 


Then 


in which 


«i — 1 ? + 23 

i 

It follows then that 


(Xiod “b 2 Xoiw>) *b (Xiod -b SXoiio,) 


( 2 ) 


+ ^ (Xiod -b SXo.u.)® -b •.. = XiSa. -b Xa ^ -b X 3 + • . ■ , 


SaJ 


21 


31 



PRODUCT semi-invariants 


179 


from which 

(XioJ^ SXoi CO,)= Xt-i-j 53 "h £ Otjco,)^'*'^ 

4 7 

From this 

Xioo . 0 = Xio = NXi,, 

Xiio . 0 = Xmio .0 = ■ ■ ■ =0, 

and generally,“ 

(3) Xijijj,. i„ = ~ [S( —1)‘ ^'(N — 1)*’] (ii + &+■■•+ = 1). 

This is the first result to be used in calculating values of Shi’s. Note that the 
value of X*,(,! 2 '"Iv is independent of the order in which a given set of Z,'s occur. 

Calculation of particular Xkhh ■•ij^’s in terms of N and the semi-invariants of 
the sampled population is both simple and rapid as one may see from a pair 
of examples: 

Xj2 = Xjoi = Xjooi = • • • 

(suppressing superfluous zeros in the subscripts) 

= ^,[(N-l)=-h(N-l)] = ^^X4. 

Then, too, 

A' - 1 

Xil.2 = —- Xit-,.2. 


For a second example: 

Xh3 = [-(N - 1)^ + (N - 1)“ -iN- 2)] 


(AT - 2)(N*-3N-|-3), 

jys • 

Now the semi-invariants, Ski, can be expressed directly in terms of the 
product moments, Vkiih iy of the sum Sfc, and the NS’s. These product mo¬ 
ments are given by the appropriate moment generating function: 

= 1 -t- (rioi? + Srcc.) + i (xioi? -|- Sro.'co.)'^*' -f ■ ■ • . 


3 As written this result is valid if at least one of the U’a is zero which is always the 
case if N, the size of the sample, is greater than I, (Of the author’s paper cited above, 
p. 17.) 
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Then it is seen that, 

^^gC2®i)i5+(2S,)w^ = 1 + [j'lQ!? + ('Svo,nt)o}]^^^ + • ■ , 

in which 

[vioi^ + (S^OpnOw]® 

= Vio'd'^ -h 2(vin + Vian + I'lOOn + • ■ • + (»'0,2« + >'0O,2n + >'000,2n + • • )u^ 

etc. and by comparison with (1) and (2), we have 
(jSioi? + Snu) + ^ (Sw^ + (Soiw)^*^ + • - • 

S log |l + ^ Iviot? + (Sl'o.ni-lw] + 2 ^ [>'i|)>> + (Sj'o.n,)^)]® + . . .|. 

From this 
(Siod' •+■ 

_ 1 ^ (-l)’’-^(p-l)l(fe + Olhot? + (2>'o.n.)a>]^{[>'io>? + (Svo.Jg^j^^r . ■. 
iV*+' (1!)--(2I)'... r!sl.. • ' 

in which 

r + s + t + = p, 

the summation extending over all partitions (r2’3‘ ■■•) of Ik + 1. This, of 
course, is only the usual formula for semi-invariants in terms of moments appro¬ 
priately modified. In particular, 

(Slot? + 5oiw)® = {[rioJ> 4- (Svo.bOw]® - [riQ^ + (Sj-c.b.)!.;]*} 

If we write 


[>'101? -|- (S>'o,m)w] = W 

(gj (^ioi> + iSoiw)^" = 1 (F'® - 3F®F + 2F®) 

(Aoi? + (Soiw)^® = - 4F®F - 3(F‘®)* + 12F^®F® - 6F^]. 

Now the can be replaced by their values in terms of the 'KkhH 

the details of which will be explained below, and it will be evident that any 
is unaltered by a permutation of the 2,’s in its subscript. Taking 
account of this, the formulae (5) may be written in the expanded forms; 

1 



PRODUCT BUMI-INVARIANTS 


181 


Snix, nin) = ^ ["211 PMPOn ~ ^VlnVlO + 2 vmPQ„] 

Sli(x, trin) = ^2 [p 1 , 27 v + {N — l)Pu„ — PloVo.ln — {N — l)vioVontl 

— 2NvinVQn + 2iV^PloPOn] 

Bul, with no losB in generality, the origin may be taken at the population mean 
ao that Xi = 0. In this case it will be found that rjo = 0 and these formulae 
become; 

Snix, m„) = vin/N 

S2l(Xf Wti) — \y2n PiQPOn] 


Sn(x, m„) = ^ [vi, 2 n + (N - l)ri„„ - 2iVri„ron] 

^31 (k, Win) — [pSn PSQPQn 3plnP2o] 

( 6 ) _ ^ 

^22(‘^j ?W;j) [j^2|2n H" ^2Q^0,2n 

—' (iV — l)v20V0nn “ ^Nvln + 2Nv2Qvln] 

Sliix, mn) = [ri.Sn + 3(iV — l)j'i. 2 n,n + (JV — 1)(JV’ — 2)ri„„„ 

— 3Af('i,2nI'0n — 3N{N — l)vinnV 0 n — 3iVri„J'o,2n 

— 3N{N — l)ri„I'07.n + QN\„vln\- 

These formulae are the second result used in the actual calculation of 
Skiix, m„)’s. One begins with them, putting in the particular value of n for 
the central moment in question. If for instance we wish to compute the product 
semi-invariants of the mean and variance in samples of N, we begin with the 
set of formulae; 


(7) 


^Sii(a:, mi) = vn/N 
)S2i(.'E, WI 2 ) = [^22 ~ >'20>'02] 

i/ 

Sllix, m 2 ) = ^ [j'14 -|- (iV — l)pi22 ~ 2iVj'12ro2], 


etc 

The second step is to replace the product moments mih ‘x which appear by 
their values in terms of the corresponding product semi-invanants. This process 
can perhaps be best explained by some examples. 
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Consider the complete calculation of Su(x, mt) From the expression for the 
fifth central moment in terms of semi-invariants. 


rs = Xs + 10 X 5 X 2 , 

we can write the corresponding expression for product moments m terms of 
product semi-invariants 


( 8 ) 


(Si/A)'"’ = (SXA)® + 10(2)XA)®(2X,iy.)‘"^- 


Then we get vu by comparing coefficients of and ^122 by comparing eoeffi- 

h n2 q2 

cients of in this identity For an index as low as 5, these coefficients 

are readily picked out by inspection; for larger indices the use of Hammond 
operators reduces this to a mechanical i-outine.’' In this case we have 


D,D^{U) = (12)(02) -h (03)(11). 


To the terms on the right the appropriate binomial coefficients must be applied 
giving 

3(12)(02) -t- 2(03)(11). 

51 

The total of these coefficients is 5 = jjYj, a necessary check. Then multi¬ 
plying these coefficients by 10/5, we have 

6X12X02 “t" 4X03X11 

for the required coefficients in the second term in (8). Thus 

>'n — Xi 4 4 - (6X12X02 4 “ 4X03X11). 


The two terms in parentheses arise from the same term in (8) and would both 
give rise to terms in XaXo in the final result if Xn were not identically zero from (3). 
In practice all terms in which X^ is a factor are crossed out as they appear. 
Next 


AI>2(122) = 2(12)(02) 4- (111)(011) 4“ 2(021)(11). 

(X 002 = X 02 ; X 012 = X 021 .) With the binomial, or multinomial coefficients attached, 
the right member is rewritten 

6(12)(02) 4- 12(111)(011) 4- 12(02i)(ll). 


' Cf. the author, loc. cit., p, 24.' 
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o! ' 

The total of these coefficients is 30 = '2 \2\ i \ ' multiplying each coeffi¬ 

cient by 10/30, we have 

Via = Xi22 "h (2X12X02 -f" 4XiiiXon 4X012X11). 

Going on with the calculation of Sii(x, m 2 ): 

Vl2 = Xl2 , 1*02 = X02 , 


and then we have: 


(Sia(.'C, mi) = ^ [{Xu {N — l)Xi22} 


{6X12X02 {N — 1 )(2X12X02 + 4 X]iiXoii) — 2 iVXi 2 X|) 2 )]. 

The first set of terms within braces gives rise to terms in X 5 ; the second to terms 
in XaXo. Next 


Xu 


(Af - 1 )(N^ - 3 N-f 3 ), 

--Am 


X 3 

N 


X122 = Xo 

Jy^ 

, (iV-l)(N-2), 

Ao3 = -“ A 3 


X021= 


X02 


N - 1 
N 


X 2 


Xon — — 


N' 


This table of values will be of frequent use in further calculations of Su’f^ 
Giving the values of both Xm and Xqu here, was unnecessary duplication. 
Now only the final reduction is to be carried out. We obtain 


Snix, mi) = ■ [(IV — 1 )X 6 -|- 4iVX,2X2]. 


This result of order 3 and of weight 5 follows a quite mechanical procedure 
and is quite brief. The length of the algebraic computations required grows 
rapidly as the weight is increased but for weights no greater than 10 undue labor 
is not required. For greater weights only time and patience is required to get 
results if they are needed. It is to be noted that by this method one may 
calculate individual terms in the result without doing any of the work required 
for the remaining terms and that one may readily shorten the work by getting 
results to a desired degree of approximation with respect to powers of 1/iV. 

There follows a table of the results so far calculated. 
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For rt = 2: 
i\^- 1, 


Sn = 

(Sai = 

^1. = 


N 

N-1 

N-1 




[(N — 1)X5 + 4iVX3Xj 


&.. ^ X. 

S 22 = ~ iN(S.iXz + Xj)] 

Si3 = UN - l)^Xv + 12NiN - DXsXa + 4N(5N - 7)Xa3 + 24iV'XaXl] 
It is not difficult to see that in general 

For n = 3: 

^ _(N-lKN-2), 

- m - 

. __ (N - 1){N - 2) ^ 

021 Xe 

Sn = ~ [(iV - DdAT - 2)X7 + W(iV - 2)XaX2 

+ 27N{N - 2jX,\a + ISA^^X^X^] 

„ _ (iV- l)(iV-2)^ 

-831-Xc 

XS 22 = ~ f(iV - l)(iV - 2)Xs + miN - 2 )mM 

+ 3&NiN - 2)Xr,X3 + 27N(N - 2)Xl + lH/Y‘’XiX2 + 3mhhz] 

s,, = [iV(iV - 1)^(JV - 2)^10 

+ 9(i\I - l)(3iV' 12A1" + 12Ar' - 5iV + 5)X8X2 

+ 27iV(4iV* - 21iV' + 36iV" - 20A^ + 3)X7X3 
+ 27N\N - 2)\7N - 11)XbX4 + 54:N\N - 2)(4AI ~ 7)XcX^ 
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H- 21N\N - 2)^(4iV - 7)X^ + m\N - 2)(23]V - 50)X6X3X2 
-I- 162A^'(iV - 2){^N - 12)xJX 2 + 54A^'(29iV' - 126iV + 140)X4X3 
-I- 108iV*(5iV - 12)X4X2 + 324iV‘(5JV - 12)X?X^]. 

For ra = 4: 

Sn = -W + 3)X5 + m{N - DXaXj 

S 21 = [iN^ -3N + 3)X6 -f 6NiN - l)(\,U + X^)] 

Si, = -SN + 

4- iN(N‘ -ZN + 3)(7iV* - 18iV + 15)X7X2 
-f- 4:N(N^ -ZN + Z)(im^ - 66Ar -|- 63)X6X3 
+ iN(2QN^ - mN^ + 537JV' - 639JV + 351)X6Xi 
+ 12N\17N^ - 71iV" -t- 117Ar - 69)X6X? 

+ 24.N\Z5N^ - 173iV“ + 309iV - 189)X4X3X2 
+ 72N\N - 2)\ZN - 5)X3 + %N\4N^ - 9iV -f- 6)X3X2] 

S 31 = [{N^ -ZN + Z)h 4- QNiN - 1)XbX 2 + 18^(iV - DXiXs] 

S,, = KN - 1){N^ - 3iV 4- 3)“Xio 

4- 4iV(iV' - 3iV 4- 3)(7iV' - ISAT 4- 15)X8X2 
4- ZN{N" - ZN + 3)(13iV' - 42Ar -4 39 )XtX3 
4- 12N{IQN* - IQ6N" + 285N" - ZQQN 4- 180)X«X4 
4- 12iV'(17iV^ - 71iV“ 4- 117iV - 69)X6X1 
+ iNi29N* - 195iV' 4- 5Z7N" - 693iV 4- 351)X6 
4- 48A^'(26iV’ - 125iV' 4- 213i\^ - 129)X3XaX2 
4- 24A^^(35iV^ - 173iV' 4- 309Af - 189 )x5Xs 
4- 2iN\62N'‘ - 326Ar' 4- 597N - 369)X4X3 
4- 96iV’(4iV' - 9iV 4- 6)X4X2 4- 2S8N\iN" ~ 9N + 6)X3X2] 


The University oe Michigan, 
Ann Arbor, Mich. 



ON THE NON-EXISTENCE OF TESTS OF “STUDENT’S” HYPOTHESIS 
HAVING POWER FUNCTIONS INDEPENDENT OF q 

By George B. Dantzig 

1. Introduction. Consider a system of n random variables xi, *2 , ■ • • , 
where each is known to be normally distributed about the same but unknown 
mean, and with the same, but also unknown standard deviation c. The 
assumption, Ho, that f has some specified value, Jo , e.g. Jo = 0, while nothing 
is assumed about v, is known as the “Student” Hypothesis. Two aspects of 
the hypothesis Ha have been already studied extensively. If the alternatives 
with respect to which it is desired to test Ho assume specifically that J > Jo, 
(or J < 0), then we have the so-called asymmetric case of “Student’s Hypothe¬ 
sis” and it is known, [1], that there exists a uniformly most powerful test of Ha. 
This consists in the rule, originally suggested by “Student,” of rejecting Ho, 
whenever 

( 1 ) / = 

where x and S denote the mean and the standard deviation of the observed 
a;,’s and t<, is taken, for example, from Fisher’s Tables [2] with his P = 2a. 
In other words ia is such that 

(2) P(t > t.lHo) = «, 

where a is the chosen level of significance. In accordance with the definition 
of the uniformly most powerful test, whenever any other rule, E, offered to test 
the same hypothesis Ho has the same probability a of Ho being rejected when 
it is true, the power of this alternative test cannot exceed that of “Student’s” 
Test. In other words, if it happens that the true value of J is not equal to Jo 
but is greater, then the probability of this circumstance being detected by 
“Student's” test is at least equal to that corresponding to the rule R 
If the set of alternative hypotheses is not limited to those specifying the 
value of J either greater or smaller than Jo, but includes both those categories, 
then it is known, [1], that there is no uniformly most powerful test of the hy¬ 
pothesis, Ho. However in this case there exists a slightly different test, also 
based on “Student’s” criterion i, possessing the remarkable property of being 
unbiased of type 5i, [3], The test, m common use for a long time, consi,st.s in 
rejecting Ho when 
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with la Ix'iiig taken again from Fisher’s tables, this time conesponding to Ins 
P = a, where a is the chosen level of significance 

In ordei to describe the optimum property of this test wc must use the con¬ 
cept of the power function of a test, [3]. Denote by /3(^, a) the probability of 
the hypothesis Ho being rejected when ^ and or are the true mean and the tiuc 
standard ciror of the observable x.’s The function j3(J, a) is just what is 
called the power function of the test. If wc substitute ? = |o, then we shall 
have /3(^o i a-) = ot iriespcctive of the value of a-. Now the optimum property 
of “Studentts” te,st mentioned above consLsts in that (1) its power function 
has a minimum at J and this is true whatever be the value of o-, (2) what¬ 
ever be any other test of the same hypothesis which has the same level of sig¬ 
nificance a and has property (1), its power function (9'(f, cr) cannot exceed that 
of "Student’s” test 

These two properties, dcmonstiating the excellence of the criterion suggested 
by "Student,” fully justify the geneial oonfidenci' in the test as de,sciibed above, 
or in its extended form where it is applied to two oi moio samples However, 
it is known that “Student’s” te.st in both its foims, t > ta , and | i | > ia , has 
one very unde.sirable property which causes great difficulties in various problems 
of rational planning of cxpci iments 

One of the most important questions to bavc in mind when planning an 
experiment is' What is the probability that tho experiment and the subsequent 
statistical test will dctcet a difference or effect when it actually exists? If we 
perform an experiment and then apply some statistical analysis to test 
"Student's” hypothesis that ^ = |o, we do hope that, if the actual value of ^ 
IS different from , the test will discover this circumstance. But apart from 
mere hope, it is desirable to take precautions so that when the difference, 
{ — £o = A, has some appreciable value, the chance of the hypothesis Ho being 
rejected will be reasonably large. This may be done by calculating the value 
of the power function /3(f, cr) corresponding to the value ^ = fo + A And 
here we come to the unfortunate property of “Student’s” test 

Although the form of the power function of “Student’s” test is known and 
tabled [4], [5], [6], [7], there are occasionally considerable difficulties in applying 
these tables, because it appears that the values n and A are not all its arguments, 
for it also depends on c. Con.scquently m order to have an idea of the proba¬ 
bility that the test will detect the falsehood of the hypothesis Hq that { = 
when actually | = + A we need not only the knowledge of n but also a 

likely value of a The latter is known accurately only m exceptional cases and 
then in those cases one would apply a test which is different from “Student’s” 
test. Usually we have only a vague notion of the magnitude of c and accord¬ 
ingly the tables of /3(f, <r) may be used to obtain a rough idea as to whether 
the arrangement of the experiment planned is satisfactory or not Frequently 
we have no idea of what may be the values of cr 

To Dr. P, L. Hsu is due the idea of looking for te.sts, the power of which is 
independent of the parameters unspecified by the hypothesis tested. In an, 
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unpublished paper, he proved among olher things that the X test of the general 
linear hy-pothesis is the most powerful of all those, the power function of which 
depends on the same argument as that of the X test and not on other parameters. 
The above circumstances suggest the following problem’ to see whether it is 
possible to devise a test of “Student’s” hypothesis such that its power function 
would be independent of a If such a test could be devised and proved to be 
reasonably powerful then the tables of its power function could be used for the 
purpose of planning experiments. 

The purpose of the picsent paper is to show that no such test exists and, 
consequently, this negative result implies in still another way that it is im¬ 
possible to improve on the test originally suggested by “Student.” 

2. Statement of the Problem. The problem of finding a test whose power 
function is independent of a- is equivalent to finding a critical region w such 
that the value of the power function 

(4) a) = P{E ew\ a] 

for any fixed ? is independent of the value of <r, where E denotes the sample 
point {xi, Xi I • • • x,i). We shall show specifically that if this is the case, then 
the power function is also independent of f, so that the test will reject the hy¬ 
pothesis tested with the same frequency independently of whether it be correct 
or wrong. 

3. Theorem. If there exists a region w such that, whatever be the value of a, 

® (vfc^) I I ■•-dx.^a 

w 

(vbff) / ' ■ / ^*’~'*'* dT 2 -' - dx„ S /3, 

w 

where i a, /3 are constants, then 

( 7 ) « - / 3 . 

A legion w is called similar [1] to the whole sample space, W, of size a, with 
respect to a sot of elementary probability laws 'p{E \ 0) given in terms of a 
parameter 6 , U P{E ew \ 8} = a, whatever be the value of 6. Essentially, 
then, the region, w, above is a similar region with respect to two different sets 
of elementary laws each being given parametrically in terms of the parameter c. 

n 

Denote by Wr the portion of the surface of the hypersphere, 23 (*> ~ 

which is common to w, and let the total surface be denoted by Wr ■ Neyman 
and Pearson have shown [1], that a necessary and sufficient condition that w 
be a similar region, in the above case, is that, whatever be r, the probability 
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that the sample point E will fall on the subsurface Wr , when it is known that 
the sample point lies on the surface Wr is a, i.e 

( 8 ) P{E eWr\iEeWr)i^ = ^o)} = a 

for all r. 

In a similar manner let Wp denote the portion of the surface of the hyper- 

n 

sphere £ (a:, — fi)'* = common to w, and let the total surface be denoted 

4=1 

by Wp. Since w is similar to the set of probability laws indicated in (6), we 
have also 


(9) 


P{EeWp\(E^Wp)(^ = ^i)] 


for all p. 

Since on the surface Wr, the elementary probability law, 


( 10 ) 



3 



e~i^, 


is constant, we see that an equivalent statement of (8) is that the hyper-area of 
Wr is a constant proportion, a, of the total hyper-area Wr • Similarly, from (9), 
we have that the hyper-area of Wp is a constant proportion, (3, of the area of the 
hypersurface Wp, whatever he the values of r and p. 

Consider the transformation which expresses Xi, Xi, • • • a:„ in terms of gen¬ 
eralized polar coordinates with pole at the point (fo, fo, • • • j &), i.e. 


( 11 ) 


Xi — go = r cos 62 cos di •• • cos d„-2 cos 0„_1 cos 6 n 

X 2 — go = r cos 82 cos 03 ■ cos 0„_2 cos 5„_i sin 

X3 — go = r cos 82 cos 83 • • cos 0„_2 sin 0„_i 


Xn-i — go = r cos 82 sin 63 


x-n, — go = r sin 82 

Let A be the Jacobian of the transformation: 


( 12 ) 


A 1 = r"' 


n 

n cos 0n-\-2—* 


‘r(0.). 


Consider also a transformation which expresses (xi, ajj, • ■ Xn) in terms of polar 
coordinates, the point (gi, gi, • • • , gi) being pole. It may be obtained by 
replacing in (11), go by g:, r by p, and 8 , by Si. The Jacobian of this trans¬ 
formation is given by j A | = p’'~^T(8,), 

We are now able to express the hyper-area of Wr '■ 

(13) // IA I dej dOa • • ■ = r”"* j f T(8r) dOg d8, .. ■ dJ8„ = Kr""', 

ir, Wr 
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where the integral if > 0 is a constant independent of r. Similarly the hyper¬ 
area of TFp is ifp"~\ where K is the same as in (13). According to (8) and 
(9) we have, now 


(14) 


// 


A I ddi ddi 


dBn = a-K-r' 


n-l 


^5^ J J IA I dfla d&a * • • d6n = ^ 

“p 

Let us consider the distances between the three points: (a:i, xz, • - • , a:„), 
(fo, So, • • , Jo), and (Jj, Ji, • ■ • , |i) The distances of the first point to the 
second point and to the third point we have already denoted by r and p. Let 
the distance between last two be L, then, since the sum of two sides is at least 
equal to the third side of a triangle, we have 

(16) r^p-l-L, p^r + L, where L = \/N | Jo — |. 

Let <p{t) S 0 be an arbitrary monotonic nonincreasing function of t, such that 
the product r'’V(0 is integrable from 0 to -f-oo. Since (p(t) is a decreasing 
function it follows from (16) that 

(17) <p{r) ^ <p(p + L) and p(p) ^ <p{r -f L). 

Consider the integral I: 

( 18 ) ^ ~ 11 dXn. 


We shall express it in terms of the variables r, h, ■ ■ ■ , and also in terms of 
P, di, ■ h and compare the results. Thus 


(19) 


7 = ffl^l<p(r)drdBz dBn 

V 

= jf <p{r)dr j j \ A\dBt‘.- de„ 

10 r 

= a-K-f r"''V(r)dr. 

Jo 


Also we have by (16) 

^ ~ 11 dpdBt ... dBn 
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(20) • ^ If I ^ 1 vip + L) dp dOi • •. d6„ 

V) 

— Jo J J I ^ I ‘ 

IB, 

and consequently 

(21) ^ ^ fi’K f p“ ^<p(p + L) dp. 

Jo 

Since K > 0, we have from. (19) and (21) 

(22) a/^ S jf f-V(< + L)dt/ di. 

By interchanging p and r in (18), (19), (20), and (21) we have also 

(23) p/a ^ jf e-^<p(t + L)dt I jf f-^iDdU 

Let us set in (22) and (23), <p{i) = and <p{i + L) = where p > 0 

is arbitrary. Then 

(24) a/p ^ e”^-" and p/a ^ e"”"'. 

Since (24) holds for all p > 0, let p approach zero. Then Lim = 1, and 
the above inequalities can hold only if 

(25) a^ p, Q.E.D. 

It is of interest to note that there do exist regions such that the power func¬ 
tion is independent of both $ and cr. For example, let Sn be the standard 
deviation of the observed values {xi,X 2 , • • • , ®„) and let S„_i be the standard 
deviation of the values (a:i, 0 : 2 , • • • , a:„_i), then the region w given by all 
points (* 1 , 0 : 2 , ■ • • Xn) which satisfy the inequality {Sn-\/S„) ^ C is such a 
region, i.e. 

(26) P{(S„-i/S„) ^ Cj 

is constant, whatever be the values of { and a-. Such regions are, however, 
unsuitable for testing “Student's” hypothesis { = Jo, because they will reject 
this hypothesis when it is wrong and when it is correct with equal frequency. 


The author is indebted to Professor J. Neyman for assistance in preparing 
the present paper. 
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A METHOD FOR RECURRENT COMPUTATION OF ALL THE 
PRINCIPAL MINORS OF A DETERMINANT, AND ITS 
APPLICATION IN CONFLUENCE ANALYSIS 


By Olav ReiersizIl 

1. Recurrent computation of all the principal minors of a determinant. 

The formulae which I develop in this paper have been worked out for use in 
statistical confluence analysis. By means of recurrent computation they shorten 
considerably the amount of work required to compute all principal minors of a 
square matrix. Originally I elaborated this method as a simplification of one 
given by Frisch (not published). 

Subsequently I found that the method could more easily be deduced from the 
pivotal method. This method has been described, for example, by Whittaker 
and Robinson [5] and by Aitken [1]. 

Let us consider a square n-rowed matrix 


an 

an 

• • • flln 

(hi 


• • • fl2n 

flnl 


• • • O/Jin 


Let the adjoint of this matrix be H Pi, || and let us denote its determinant 
value by Di 2 . .n ■ 

Then we have the following identity 


( 2 ) 


P7i-l,7i-l Pn-l,n 
Pfi.n—1 Pn,n 


DiS...T.DlJ...n-2- 


As Aitken points out, the pivotal method is based upon this identity. 

Next consider the following matrix which is formed from the matrix (1) by 
striking out the nth row and the (n - l)th column: 



On 

• • • ffli.n-! 

Ol.n 

(3) 

nn-2,1 

• • ■ On—2,71-! 

071-2,71 


071-1,1 

• • • On-l,n-2 

On—l,n 
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Let U6 denote its adjoint by || q^, ||, its determinant value by Ai^. „ = . 

The determinant 


Oil 

• 1 • Oi,„_2 



• • ■ fln—2,n—2 

^n—2,n—1 

®n*l 

• * • fl-riifi—2 

fltn.n—1 


we shall denote by Bn.. „ . 

The identity (2) can now be written 


(2') 


Bn., n = 


-Z)l2. • 2,n• *n—2,n—1 “ An.'.nBn n 

lll2> ••n-2 


If we apply the identity (2) to the matrix (3) we get 


$n—2,n—2 

l,ft— 2 


2,n—1 
Q'b— l,w—1 


— An 


nDn 


• *n-3 j 


which may also be written 


(4) 


112 — 


n-a,n-l,n-Dl2 n-2 ~ An. n-S,n-2,n Sl2 n-I 


D 


12 'n-S 


To simplify the notation we will not write the affixes present, but write the 
affixes not present in inverted parentheses. Then our formulae (2') and (4) 
can be written 

n _ An-l<D)n( — AB 

An-l.n( 


^ _ A)n-i(D)n-l,n(, ~ .4),i-l(^)7l ( 

2,n—l,n( 


In an analogous way we get 


p _ . B)n-2(.P)n-l.n( ~~ B)„_^A)„^ 

I^>n—2,n—l.n( 

We may apply these formulae to an arbitrary principal minor D«i „2 . 

Let us now denote by D and denote the absence of one or more of the 

numbers Vi, 1 ^ 2 , • • • by placing them into inverted parentheses. We then 
have the formulae: 


<5a) 


/X ' " ---j 

(6b) 


^ n ’ 

(5c) 

P _ ~~ AB 
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By means of these formulae we can recurrently compute all principal minors. 
We begin with D, = , i = 1, 2 ■ ■ n, A., = an, B., = a,., where i < j. 

Then we compute the D’s with two affixes, 

Dt, = £),D, AtjBfj , 

and then the quantities A, B, D with three affixes, 

^tjk ~ AjkBi AtkBtj 
Btjk = BjkDt — BihAtj 

n . _ J^tkDt, — A,,hBijk _ , , 

--, i <j <k. 

Then we compute the quantities A, B, D with four affixes, and so on. 

If we carry through the computations without dropping any figures we have 
as a control that all divisions will be exact without remainder. If we are 
dropping figures we can control the result by computing the determinant 
Dii. -n in another way. If we wish to control the computation before it is com¬ 
pleted, we may use our recurrence formulae on the matrix which we get from 
the original matrix when the rows and the columns are subjected to the same 
permutation. For example we can reverse the order of the rows and columns. 
Then we can control the (k — 1) rowed minors before computing the A:-rowed 
minors. 

If all the D’s are different from zero, we may reduce the necessary number of 
multiplications and divisions in the following way. We introduce the following 
notations: 



D 


a = 

A 

b- ^ 




_^ 


C “ 

d)vk( 



Substituting in (5), we get the following system of recurrence formulae: 


(6a) 


(6b) 

b = -f- 

(6c) 

^_ b 


dyvti 

(6d) 

d = d)vk-ii + ex 

(6e) 

D = D)vk(d. 
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An affix Vh on a letter indicates the deletion of the last row and column in the 
determinants making up the definition of that letter, even though those deter¬ 
minants are of lower order than Vk ■ Similaily, an affix Vk-t indicates the dele¬ 
tion of the next to the last row and column. 

The a's with two affixes in these formulae are identical with the elements a„ 
of the matrix (1) where t < j. Further, = aji ,i < j, . Applying 

the recurrence formulae (6) we start with these values. 

If the matrix (1) is symmetric, i.e. if a,, = a„ , then we get 

and 


^vivj "Ok ' ^iV2"'Vk ■ 

In this case we can therefore replace £ by A in the formulae (5) and replace 6 
by a in the formulae (6). 

Numerical example. Let us compute all the scatteraiices in the constructed 
example given by Frisch, [3, p, 121], The correlation matrix in this example is, 


1.000000 

-0.121551 

0.656809 

0.752502 

-0.224549 

-0.121551 

1.000000 

0.657698 

-0.732862 

0.212165 

0 656809 

0.657698 

1 000000 

0 014385 

-0.040183 

0.752502 

-0.732862 

0 014385 

1.000000 

-0.280223 

-0.224549 

0.212165 

-0.040183 

-0.280223 

1 000000 

Using our recurrence formulae (6) we get the following table: 


a 

c 

d 

D 

12 

-0.121 551 

0.121 551 

0.985 225 

0.985 225 

13 

0 656 809 

-0.656 809 

0.568 602 

0.568 602 

23 

0.657 698 

-0.657 698 

0.567 433 

0.567 433 

14 

0.762 502 

-0.752 502 

0.433 741 

0.433 741 

24 

-0.732 862 

0.732 862 

0.462 913 

0.462 913 

34 

0.014 385 

-0.014 385 

0 999 793 

0 999 793 

15 

-0.224 549 

0.224 549 

0.949 578 

0 949 578 

25 

0.212 165 

-0.212 165 

0 954 986 

0.954 986 

35 

-0.040 183 

0,040 183 

0.998 385 

0.998 385 

45 

-0.280 223 

0.280 223 

0 921 475 

0.921 475 

123 

0.737 534 

-0.748 594 

0,016 489 

0.016 245 

124 

-0.641 395 

0,651 014 

0.016 184 

0.015 945 

134 

-0.479 865 

0 843 938 

0.028 765 

0.016 356 

234 

0.496 387 

-0.874 794 

0 028 677 

0.016 272 

125 

0.184 871 

-0.187 643 

0.914 888 

0 901 371 

135 

0,107 303 

-0.188 714 

0.929 328 

0.528 418 

236 

-0.179 723 

0.316 730 

0.898 062 

0.509 590 

145 

-0.111 249 

0.256 487 

0.921 044 

0.399 405 

245 

-0.124 735 

0.269 457 

0.921 272 

0.426 516 

345 

-0.279 645 

0.279 703 

0,920 167 

0.919 977 
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a 

c 

d 

D 

1234 

0.000 279 

-0.016 6 

0.016 179 

0.000 262 83 

1235 

-0 031 090 

1.885 5 

0 856 268 

0 013 910 

1245 

0 009 105 

-0.562 6 

0.909 766 

0.014 506 

1345 

-0 020 692 

0 719 35 

0.914 443 

0 014 957 

2345 

0.032 486 

-1.132 8 

0.861 262 

0.014 014 

12345 

0.009 621 

-0.594 7 

0.850 546 

0 000 223 55 


2, Computation of the coefficients of the characteristic polynomial of a 
matrix. The characteristic polynomial of the matrix (1) is 

fflll X flu • • • din 
P(X) = 023 — X • • • Uan 

dill dn2 Unn X 

= Pn ~ -Pn-lX + Pn_2X^ (— 1)’’X" . 

As IS well known, the coefficient P* can be calculated as the sum of all the 
fc-rowed principal minors of the matrix (1). Our method of computing all the 
principal minors of a matrix therefore gives us as a by-product a method of 
computing the coefficients of the characteristic polynomial. Another method 
for the determination of these coefficients has been given by Paul Horst [4] 
Wo may obtain a comparison between the work of computation entailed by 
the two methods by calculating the number of multiplications and divisions 
necessary when using one or the other method. If our recurrence formulae (6) 
are used, two multiplications and one division are necessary for computing a 
2-rowod miiioi, and 4 multiplication.s and one division for every minor with 3 
or more low.s Consequently the total number of multiplications and divisions 
will be 



= 5.2" - in +in + 5). 

On using Horst’s method, llie number of necessary multiplications and divi¬ 
sions will bo found to be 

= (id - l)n^ -h y + ^(n ~ l)(n + 2) 

Hn — i{n — l)(n* -f- n -h 2) n even, 

Hn = Kn — l)(n“ -h d* -f- d -h 2) 


n odd. 
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When n = 2, 3, ■ ■ 12, j?„ and H„ acquire the following values: 


n 

Sn 

H„ 

2 

3 

6 

3 

14 

41 

4. 

43 

105 

5 

110 

314 

6 

265 

560 

7 

558 

1203 

8 

1179 

1827 

9 

2438 

3284 

10 

4975 

4554 

11 

10070 

7325 

12 

20283 

9581 


We see that our method of computing the coefficients of the characteristic 
polynomial involves less calculation when n < 10, while Horst’s method is su¬ 
perior when 71 ^ 10. 

If our purpose is to find the characteristic roots of the matrix we can do this 
with less amount of computation without first finding the coefficients of the char¬ 
acteristic polynomial. See Aitken, [2], 

3. Applications in confluence analysis. The confluence analysis of Frisch is 
set forth in his book- “Statistical Confluence Analysis by Means of Complete 
Regression Systems,” [3]. 

The main method of this book is the “bunch analysis,” which includes the 
computation of the adjoints of the correlation matrices of all sets of variates 
contained in the total set. In section 1, Frisch has described a preliminary 
analysis by means of scatterancea. The scatterances are the principal minors 
of the correlation matrix of the total set of variates, If we carry through such 
an analysis, the recurrence formulae of section 1 of this paper will give a rapid 
method for the calculation of all the scatterances. 

Another application of the computation of all the scatterances arises in the 
determination of the correct time lags between variates in a structural equation. 
This problem will be treated in a paper on confluence analysis which will appear 
in the near future. 
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NOTES 

This seclion is devoted to brief research and expository articles, notes on methodology 
and other short items. 


A CRITERION FOR TESTING THE HYPOTHESIS THAT TWO 
SAMPLES ARE FROM THE SAME POPULATION 

By W. J. Dixon 

1. Introduction. The purpose of this paper is to consider a criterion for 
testing the hypothesis that two samples have been drawn from populations with 
the same distribution function, assuming only that the cumulative distribution 
function common to the two populations is continuous. Let the two samples, 
On and Om, be of size n and m respectively. We may assume n < m without 
loss of generality. Suppose the elements Ui, • • •, m„ of 0„ are arranged in order 
from the smallest to the largest, that is, tii < Wj < . • < . These may be 

represented as points along a line. The elements of Om represented as points 
on the same line are then divided into (n + 1) groups by the first sample, 0„. 
Let OTi be the number of points having a value less than Mi , m{ the number 
lying between % and u,+i , (i = 1, 2, • ,n) and the number greater than 
, (ffiti+i = m - mi - rrii - • - m„). The criterion here proposed is^ 



‘ A similar criterion 



for two samples of the same size was investigated (unpublished) by A M Mood. He 
found the mean and variance to be 


Eid>) 


2n -f- 1 
3n ’ 




8(w - l)(2w + 1) 
46n> 


It can be seen that this is the sum of the squares of the differences between the ordinates 
of the two cumulative sample distributions calculated at the jumps of the first sample 
distribution. 
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2. The mean and variance of The only case of continuous cumulative 
distribution functions F{x) of any intei'est in statistics is that in which dF(x') = 
f{x) dx, where/(x) is a probability density function. Let us write: 


Pi 


= r f(x) dx, Pi = / f(.x) dx, P „+1 = / fix) dx, 
JL« Jui 


where of course pn+i = 1 — Pi — pa — • • ■ Pn • 

Now, the joint distribution law of the pi is 

(2) P(pi, ■ • ■ , p„) = »! dpi • ■ • dPn 
and the conditional distribution of the mi given the pi is 

(3) P(wii, •. ■, m„+i IPi, • ■ •, p«) = —j- - 

Ttlll • • * 7?in+l. 

Therefore the joint probability law of the m; and p< is 

nlm! 


,prpr 


Pn+1 ■ 


(4) 


Pim, p) = 


mil • • • ?n„+i! 


Pi 


Pfl+V dpi ■ U,Pn . 


let - f(I,, ■■■, ».«) = E Fexp 2 *.~ j' 

O0jjj_o 

wc-n-s^l 

,-.1 oPi Jjeifl 00 , Jm 

<pie) = L. / exp [te. - ^)]p(m, p), 


(5) 

( 6 ) 
and 
(7) 


where Sm denotes the usual multinomial summation over all integral values of 
TO; > 0 for which Sto, = to and the integration is over the generalized tetra¬ 
hedron defined by p; > 0 and pi + pj + • • • + Pn+i <1- If we perform 
the summation first, we obtain 


"s’ 


s -Ti r *n+l 

(8) vie) = j ipie~’" + + Pn+ie " )”'dpi dpn 

Differentiating twice with respect to 6; and setting the B’a equal to zero, we get 
. rr/ I V . /l 2 \ . TO - 


30? J 




■/[(»'ir) + (s - 


If we now integrate and sum from one to n -f- 1, we find 

(9) E(c') =-r!t+i±« 

TO(n -h 1)(» -h 2) 
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Performing the operations indicated in (6), we obtain £f[(C'^)‘‘] from which we 
subtract [S(C*)]“ and have as the variance of C^, 

_2 _ 4?i(m — l)(m + n + l)(m + w + 2) ^ 
m?in + 2)*(n + 3 )(ti + 4) * 

3. Significance values of C^ If we let Cl be defined as the smallest value 
of for which P(C^ > Cl) < a then we can compute the value of C\ fairly 

TABLE I 

Values of Cl. a = 0.01, 0.06, 0.10 
3456789 10 

4 - - -- 

- - .800 


5 - - ,800 .833 

- .760 .800 .833 


- - - - .857 

6 -- .750 .800 .833 .857 

- .750 .800 .556 .413 





.833 

.857 

.875 




7 

.750 

.800 

.588 

.612 

.467 




.667 

.750 

.555 

.425 

.449 

.426 






.800 

.833 

.857 

.656 

.670 



8 

.750 

.800 

.594 

.482 

469 

.389 



.667 

.531 

.425 

.413 

.357 

.375 

.358 





.800 

.833 

.660 

.677 

.543 

.554 


9 

.750 

.602 

.448 

.413 

.431 

.395 

.381 


.667 

.552 

.454 

.389 

.363 

.356 

.321 

.307 




.800 

.833 

.677 

.555 

.549 

.480 

.449 

10 .667 

.750 

.480 

.493 

.437 

415 

.349 

.340 

.349 

.487 

.430 

.380 

.373 

.357 

.315 

.309 

.280 

.269 


readily for small values of m and n. The values of for m, n < 10 are given 
in Table I for a = 001, 0 05 and 010. Since the distribution of is not 
continuous the probabilities P(C“ > Cl) will, in general, be less than a. 
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It will be seen that if m and n increase indefinitely in the ratio n/m = -y, 
then converges stochastically to y + 1 whereas nC^ ranges from 0 to 
7iV(n + 1) which indicates a tail to the right. This suggests that for larger 
values of m and n, it is reasonable to try to fit the distribution of by the 
method of moments using a distribution of the form 


( 11 ) 

which has 


2*'r(|r) 






Setting x’‘ = nC^, we see that we can consider nkC^ distributed as x'* with v 
degrees of freedom. Of course, v is not necessarily an integer, but x‘ tables 
may be used for approximate values of the probability that nkC^ will exceed 
certain values,* or the values of that will be exceeded a certain per cent 
of the time.* More exact values of these probabilities that nfcC* will exceed 
a certain value may be found from a table of the incomplete Gamma function.* 
To calculate fc and t> directly, the following formulas obtained by equating 
the mean and variance of (11) to the mean and variance of nC* may be used; 

(12) fc = aw(n + 2)/n, v = an(n + w + l)/(n + 1), 


where 

_ 7n(n + 3)(w + 4) _ 

^ 2(m — l)(m + n + 2)(n + 1)' 

If the fitted curve (11) is used to obtain significance values of nC*, there is a 
tendency toward rejecting slightly over 100a%, especially for small values of 
m and n. The error is probably due to fitting a curve having an infinite range. 
The discrepancy decreases as m and n increase. 

The goodness of fit at the 0.01, Q.05 and 0.10 significance levels was tested 
for two cases. 

Case 1. n = 9, m = 10; nk = v = 

The exact distribution in the region under consideration is the following: 


Cl 

... 26 

.28 



.34 

36 

.40 

.42 

.44 

.48 ... 

P(0 > CJ) 

,121 

090 

.082 

.072 

.037 

.033 

025 

.025 

.015 

.007 , . 


The values of C\ from the fitted curve are Cm = 0,422, C\s = 0.323, and 
Cao = 0.277. The double rule indicates the divisions (from the fitted curve) 
for a = 0.01, 0.05 and 0.10. 


’ Karl Pearson, Tables for Slatiahcians and Biometricians, part 1, Table XII. 
* R. A Fisher, Staiiatical Methods for Research Workers, Table III. 

^ Tables of the Incomplete Gamma Function, Biometrika Office, London 
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Case 2. w = 12, W = 12; nk = 65.068, v = 8.938. 

The important part of the exact distribution for our purposes is: 


Cl 

216 

229 

243 

.256 

.270 . 

326 

.340 

.364 

.381 .. 

p{C^ > cS) 

120 

.109 

.078 

.067 

,046 


.014 

.011 

009 


The values of Ca from the fitted curve are C.oi = 0.3316, = 0.2587 and 

Cm = 0.2244. 


4. Examples. 1. Two samples of ten members each are drawn and it is 
desired to test, using a rejection region of size a, the hypothesis that these two 
samples could have originated from the same population about which nothing 
is assumed except that it is continuous. The first sample was found to divide 
the second sample into the following groups: 0, 0, 0, 3, 0, 4, 0, 0, 2, 1, 0. 

= (-n: ~ A)*' + (tt ~ A)^ + (A ~ A)^ + (A “ A)^ + 7(A)* = -209 

which we see from Table I is not a significant value even for a = 0.10 since 
Cm - 0.269. 

2. A sample of 16 divides a second of 25 into the following 16 groups: 0, 1, 
0, 0, 5, 4, 1, 3, 9, 0, 0, 1, 0, 1, 0, 0. 

C” = (A - A)* + (A - A)* + (A - A)* + (A - A)* 4- 4(A - A)* + 8(A)* 

nC^ = 2.302 k = 7.611 v = 10.19 
nfcC* = 17.295 

which gives a significant value for a = 0.10 but not for a = 0.05, since nkC\Q = 
16.233, nkCM = 18.568. Actually P(nifcC* > 17.29) = .077. 


6. Remarks. If we set W equal to the number of m, which are zero and 

V = n \ — W then V is the number of non-zero m,; further, 2V where 

V is the total number of runs, the criterion proposed in the paper of Wald 
and Wolfowitz in the present issue of the Annals of Mathematical Statistics. 
Now, 

n+l 

(13) W= lim 
so that, setting 

(14) 4* = 2m f exp r fl. ( —^ - —^1 2 


analogous to (7), we have 

E(WC^) = lim 

II.' ■ 


k^i ddl Js-o 
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from which we can find 

_ 2nCl. — m) 

2)(r^ + „) 
and 

2 ^ 2 _ (n + 3)(n + 4)(w + w - 1) _ 

Pt,c“ = Pvc^ _j_ + n + l)(m + n + 2) ‘ 

If nim = 7 (a fixed constant) and n is large 


p = — • 

71 + TO 

will be near 1 when n ia much larger than m. This corresponds, in com¬ 
puting (f, to dividing the smaller sample into subgroups by the larger. In 
this case U and give essentially the same information. When m and n are 
more nearly equal the two criteria are quite different. For n > m, has 
fewer possible values than for n < m, and is therefore a more sensitive test 
when n <m. 

While it IS doubtful that this test is biased for large samples, this question 
will not be considered in the piesent note. 

Princeton University, 

Princeton, N. J. 


SIGNIFICANCE TEST FOR SPHERICITY OF A NORMAL n-VARIATE 

DISTRIBUTION 

By John W. Mauchlt 

1. Introduction. This note is concerned with testing the hypothesis that a 
sample from a normal n-variate population is in fact from a population for 
which the variances are all equal and the correlations are all zero. A popula¬ 
tion having this symmetry will be called “spherical.” Under a linear orthogonal 
transformation of variates, a spherical population remains spherical, and conse¬ 
quently the features of a sample which furnish information relevant to this 
hypothesis must be invariant under such transformations. 

A situation for which this test is indicated arises when the sample consists 
of A n-dhnensional vectors, for which the variates are the n components along 
coordinate axes known to be mutually perpendicular, but having an orientation 
which is, a priori at least, quite arbitrary A specific application for two 
dimensions, treated elsewhere [1], may be mentioned. Each of N days fur¬ 
nishes a sine and a cosine Fourier coefficient for a given periodicity, and these, 
when plotted as ordinate and abcissa, yield a somewhat elliptical cloud of N 
points. The sine and cosme functions are orthogonal, and their variances have 
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equal expectancies for a random senes. The arbitrary'natii^„ .1 . 

? u .1 , 1 ■ '-ire 01 the orientation 

of axes appears here as the arbitrary choice of phase, or , 

five ellipses studied, three could easily have come from circular ^0 1 f ^ 

(random), and two showed highly significant ellipticity P pu a 10 ns 


2. Likelihood ratio criterion for sphericity. The method of Neyman d 
Peaison [2] will be used to derive a test criterion which secrj^g entirely suitabk 
Let 0 be the class of all normal n-variate populations, and let „ be the subclass 
of all normal n-variate populations satisfying the hypothegjg of “sphericitv ” 
The likelihood ratio criterion is obtained by taking the r^tj^ pj- maximum 
of the likelihood for variation of all population parameters gpecjfyijjg ^ 
maximum of the likelihood for variation of all population parameters speci¬ 
fying Q That is, ^ 


( 1 ) 


_ P(w max) 


For the set Q, the probability law for a single observation of the n variates 
may be written; 


•( 2 ) 




{i,j = 1,2 ... n), 


where c^, is an element of the matrix ||o./l! > ^he a„ bgjng variances and 
covariances, a, is the mean value of the variate a;, in the Population and iC is a 
constant the value of which does not concern us here, ^ sample'of N 

from 9. has the probability, 


(3) 

Letting 

( 4 ) 




a-,„ = Nx^ and 2 (*•" ~ s 

a=l a-1 ’’ ’ 


differentiating the logarithm of P with respect to the Parameters a, and a, 
and setting these derivatives equal to zero, the maximum likelihood estimates' 

(5) d; = i. ; » 

are obtained. Substituting these values in equation (3) maxi¬ 

mum value of the likelihood is 

(6) P(n max) = K'' | *•; 1 ® * • 

The derivation of P(&) max) proceeds, upon similar linej^ bu^fis ilpipler for 
the probability law for the set to is obtained from (3) by setti^ll §1 ' 

C^) c., = oSti, 
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where c is any positive constant., and 5,, = 0 if * j and 1 if z = j. The result 
is found to be 

(8) P(a, max) = 
where So is defined by 

n 

(9) nso = £ Si. • 

1=1 


The likelihood ratio criterion i.s therefore 



It will be convenient to designate the iVth root of this statistic as Ls„, where 
the second subscript indicates the numbei of variates: 

(11) L.„ ^ 

So 


3. The moments of the distribution of when the population is spherical. 
The distribution of L,„ cannot be easily obtained in explicit form for a general n' 
but the moments of L,„ when the hypothesis tested is true are easily found. 

Note first that L,n may be resolved into two factors which are, when the 
population is spherical, statistically independent 


( 12 ) 


— 


{siSsSaSt • • • s „)^ 
So" 


The first factor is just the one appiopriate for testing the equality of the n 
variances when the orientation of the coordinate axes is fixed in advance, while 
the second factor is the square root of the determinant of correlation coefficients. 
The moments of the distributions of these tw'o statistics are known [3], and 
since the two are independent (for zero correlation in the population), we may 
write; 


(13) M.iL,,} = MdA)M,{B), 

where A and B are used to indicate the two factors, and Mi, indicates the /ith 
moment. The moments are given by 


(14) 


Mh(^Lgn) 


fv frMJV - i + ;i)1 ri(«(iV - i)) 
T^N-i) r T^niN-l + h)) 


4. Significance test for n = 2. Foi n = I, ikfA(L,i) = 1 for any h, as it 
should, since La is then identically 1, and the concept of sphericity is meaning¬ 
less. For n = 2, the expression (14) reduces to, 


M,iLa) = 


r(iV - 2 4- h)riN - 1 ) 
r(Af - 1 -f- h)T(N - 2) 


N - 2 
N-2 + h 


(15) 
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and the distribution is thus found to be 

(16) = (AT - 2)L^f^dL., . 

Thus for n = 2, the significance of the value of obtained from a given sample 
of N points in a plane is simply 

(17) PiL.1 < l:,) = L:r. 

These results for n = 2 were obtained by another method in [1]. 


6. Significance test for n = 3. For n = 3 and higher values of n, no simple 
expression for the distribution seems obtainable. In this case it appears reason¬ 
able to fit a Peanson curve of the type, 

(18) y = Jf.T’’-'(l - xy-\ 


by adjusting p and q so as to obtain agreement with the first two moments of 
the actual distribution The calculations were carried out for rather than 
La itself, to simplify the moment expressions. The first moment of Lli is the 
second moment of La , and is given as a function of JV by the equation. 


(19) 




(3N 

(3N 


~ 6)(3A7 - 9) 
- 2)(3N - 1) ■ 


Recurrence relations, similar 1.o those noted by Lengyel [4] in carrying out a 
similar task, hold for the moments of ; hence. 


( 20 ) 


Mm = MmMN + 2 ). 


Explicit solution of the equations for p and q in terms of N is possible 


(21) 

_ (9W + 5)(iV - 2)(N - 3) 

^ 2(9N^ - 8iV - 15) 

(22) 

2(9iV - 13) (QW + 6) 

^ 9(9W“ -8N- 15) ■ 

For values of N 

> 30, acceptable approximations to p and q are obtained by 

carrying out the division indicated m (21) and (22). 

(23) 

p = ^(N - i) + 2/9 -f- 70/81(W -1- 1) . ■ ■, 

(24) 

^ ^ 9(3N - 2)" ' ■ ■ ■ 


The values of p and q arc given in Table I so that those desiring other than 
the standard significance levels may icadily enter the Pearson tables 
For N a multiple of 4 from 8 to 48, and a multiple of 10 from 50 to 100, the 
significance levels were taken from the Incomplete Beta-Function Tables, using 
adequate interpolation. The final Tabic I was then prepared by fillmg in the 
skeleton table by interpolation with respect to N, 

From the results of Wilks [5] it follows that —2N logj L,,, is, for large JV, 
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TABLE I 


5%S i7o> significance for the S-dimensional sphericity critenon, 

and the values of p and qfor the Pearson Type I curves used in 
calculating these levels 


N 

6% 

1% 

0.1% 

P 

5 

8 

0.172 

0.083 

0.030 

2.3239 

2.0312 

10 

.278 

.165 

.080 

3.3044 

2.0194 

12 

.366 

.243 

.139 

4.2911 

2.0131 

14 

.436 

.312 

.197 

5.2816 

2.0095 

16 

.494 

.372 

.252 

6.2744 

2.0072 

18 

.541 

.423 

.301 

7.2688 

2.0057 

20 

.580 

.466 

.346 

8.2642 

2,0046 

22 

.614 

.504 

.386 

9.2605 

2.0038 

24 

.642 

.538 

.422 

10 2574 

2 0032 

26 

.667 

.567 

.454 

11.2548 

2.0027 

28 

.689 

.593 

483 

12 2526 

2.0023 

30 

.708 

.616 

.510 

13.2506 

2.0020 

32 

.724 

.637 

.534 

14.2488 

2 0018 

34 

739 

.655 

.555 

15.2473 

2 0016 

36 

,753 

.672 

.575 

16.2458 

2 0014 

38 

765 

.687 

.594 

17.2447 

2 0012 

40 

.776 

.701 

.610 

18.2435 

2.0011 

42 

.786 

.714 

.626 

19.2425 

2.0010 

44 

.795 

.726 

640 

20.2416 

2,0009 

46 

.804 

.736 

.653 

21.2408 

2.0008 

48 

.811 

.746 

.665 

22,2400 

2.0008 

50 

819 

.756 

677 

23,2394 

2 0007 

55 

.834 

.776 

.703 

* 

* 

60 

.848 

,793 

.725 

28.2365 

2 0005 

65 

.859 

.808 

.744 


* 

70 

.869 

.821 

.760 

33.2345 

2.0004 

75 

,877 

.832 

.775 

* 

* 

80 

.885 

.842 

.788 

38.2328 

2.0003 

85 

.891 

.851 

.799 

* 

* 

90 

.897 

.859 

.809 

43.2317 

2.0002 

95 

.902 

.866 

.819 

* 

* 

100 

.907 

.872 

.827 

48.2308 

2.0002 


‘No values for p and q were calculnted for these values of N; the levels were obtained 
by interpolation (see text). 

distributed approximately like x* with n(n — l)/2 degrees of freedom. How¬ 
ever, equation (24) above suggests that for large N one may get a very good 
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approximation (for n = 3) by setting 17 = 2; the sigiiificanee test for n = 3 
then becomes, 

(25) P{Ls, < L[,) = hLT\{N -2)- iN - 4)L:S]. 

Probably similar approximations can be found for other values of n. It is a 
pleasure to acknowledge the helpful comments and advice which I received 
from Mr. A. M. Mood of Princeton. Recognition is also due Mr. Wallace 
Brey, a student assistant under the National Youth Administration, who aided 
in the computations. 
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A SIMPLE SAMPLING EXPERIMENT ON CONFIDENCE INTERVALS 
Bv S. Kullback and A Frankel 

1. Introduction. In order to illustrate some of the notions of the theory of 
confidence or fiducial limits in connection with a course in Statistical Inference 
at the George Wa.shington University, we had the class carry out certain simple 
experiments, following a suggestion in one of Neyman's papers on Statistical 
Estimation [1] In the belief that the experimental data may be of interest 
to others, we present the results herem. 

2. The problem. Wc consider the problem of estimating the range 0 of a 
rectangulai population defined by p{x, 9) dx = dx/d, 0 ^ z S 6 and 111 par¬ 
ticular, for simplicity, wc limit ourselves to samples of two and four. Wc 
considci thiee pos.sible approaches to the pioblem, viz , by using (a) the sample 
range (b) the sample average 01 total (c) the larger (largest) sample value. 
Let us consider each in turn. 

(a) Sample range Wilks [ 2 ] has shown that for samples of n and confidence 
coefficient 1 — a, the confidence or fiducial limits for the population range 9 
arc given by r and r/i/'a , where r is the sample range and i/'„ is determined by 

(1) — (n ~ l)^a] = a 

For n = 2,a — 0.19 and n = ■i,a = 0.1792, (1) yields \pa = 0.1 and \pa = 0.-4 
lespectively. Accordingly, for samples of two with confidence coefficient 
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1 - a = 0.81, and for samples of four with confidence coefficient 1 — a = 
0.8208, the confidence interval is respectively given by 

(2) (r, lOr) and (r, 2 5r). 

The length, X,, of the confidence interval is respectively 9r and 1 5r. Using 
the distribution of r, n(n — 1)(9 — we have for samples of two: 

B(X,) = 35, o■^, = 2 12130, and for samples of four: -E(Xr) = 0.90, (Tx^ = 0.30, 
( 6 ) Sample total. Following Neyman [1, p. 357] let us denote by 4(0) the 
region defined by 

(3) 0 - A < a;i + X, < 0 + A 

where 0 js the population range, Xi and the sample values of the sample Ei 
and A IS selected so as to have P(Fj e4(0) | 0) = 1 - a. It is readily found 
that P{Ei e4(0) I 0 ] = [ 0 ^ — (0 ~ A)V 0 ’‘ = 1 — a from which we find that 
A = 0(1 - o;'^^) Accordingly (3) becomes < Ki + X 2 < 0(2 - a''“), 
yielding the confidence limits (xi + X 2)/(2 - a^^), (xi + Xi)/a‘^ For the 
confidence coefficient 1 — a = 0.81 the confidence interval is given by 

(4) [0.6394(xi + X2), 2,2941 (xi + Xj)]. 

The length of the confidence interval is given by \r = 1.6547(xi + xj) so that 
F(Xr) = 1.66470, vxr = 0,07550. 

Let us denote by 4'(0) the region defined by 

(5) 20 - A < xi -t- + x, + 24 < 20 + A, 

where 0 is the population range, Xi, 22 , Xs, Xt the sample values of the sample 
Ei and A is selected so as to have P{Ei (A'{9) | 0j = 1 — a. Using the known 
distribution of the sample average [3] and 1 — a = 0.8208, it is readily found 
that 

from which we find that A = 0 7880. Accordingly, (5) becomes 1 2120 < 
+ 22 + 23 + 24 < 2.7880, yielding the confidence interval 

(6) [0.3587(xi + X2 + 23 + 24), 0.8251(21 + 22 -f 23 + 24)] 

The length of the confidence interval is given by Xr = 0.4664(2i + 22 + 23 + 24) 
so that E{\r) = 0 93280 and vx^ = 0 20790. ' 

(c) Larger (largest) sample value. Again following Neyman [1, p. 359] let us 
denote by 4i(0) the region defined by 

(7) q9 <L < B 

where 0 i.s the population range, L the larger of the two sample values 2 i and 22 

and 9 , a number between zero and unity, to be determined by PI Fs e4i(0) | 0) = 

1 - a. It IS readily found that P{F* e4i(0) | 0] = (0^“ - 9 * 0 ^ 0 ’* = 1 - a. 
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from which we find that q = Accordingly, (7) becomes da'^ < L < 0 
yielding the confidence limits L, L/a'^. For the confidence coefficient 1 ~ a = 
0 81 the confidence interval is given by 

(8) (L, 2.2941L). 


TABLE I 


No. of cases of 
coverage per 

Frequency 

set of 100 
samples 

Range 

Sum 

1 Larger (Largest) 

1 

X 

Samples 
of two 

Samples 
of four 

Samples 
of two 

Samples 
of four 

Samples 
of two 

Samples 
of tour 

69 





1 


70 







71 





1 


72 







73 






1 

74 


1 



1 


75 







76 

4 


3 


4 

1 

77 

2 


6 

1 

2 


78 

3 


6 


3 

1 

79 

9 

2 

4 

2 

3 


80 

3 

1 

6 


4 


81 

2 

2 

1 


3 


82 

2 

1 

6 

1 

2 

5 

83 

3 

3 

3 

1 

5 

3 

84 

3 

2 


1 

4 

1 

85 

3 



3 

2 


86 

2 

2 


2 

2 

1 

87 

1 

1 

2 

1 


1 

88 



1 

2 

1 

1 

89 

1 


1 

1 



90 







91 

1 






Average . . 

39 

15 

39 

15 

39 

15 

81.1 

82.1 

80.2 

84.2 

80 2 

82.1 


The length of the confidence interval is given by Xt = 1 294IL so that using 
the distribution of L, dL, we have E(Xi,) = 0.86270 and = 0 30500. 
Incidentally, since L ^ xi + Xi we have 1.2941Z; < 1.6547(a:i + is) so that 
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in every case, for samples of two, the confidence interval of procedure (c) is 
shorter than the confidence interval of procedure (b) 

For samples of four, we consider the region (7) where L is the largest of the 
sample values Xi , Xi , Xs and Xi of the sample E 4 It is readily found that 
P[EieAi{d) I 6) = (fl* - = I - a, from which we find that q' = a, 

For a = 0,1792, q = 0.0500 ,so that (7) becomes 0 65060 < L < 6 yielding 
the confidence interval 

(9) (L, 1.S370L). 

The length of the confidence interval i,s given by X, = 0 5370L so that F(Xi) = 
0,42900 and = 0 08770. 


TABLE II 



Sample 

Range 

Sum 

Larger (Larg¬ 
est! 


a\2C 

Theo¬ 

retical 

Ob¬ 

served 

Theo¬ 

retical 

Ob¬ 

served 

Theo¬ 

retical 

Ob¬ 

served 

Confidence Coefficient 

2 

.8100 

.8110 

.8100 

.802 

.8100 

.8020 


4 

1 

j .8208 

.8210 

.8208 

.842 

.8208 

1 

.8210 

Average length of confi- 

2 

3.0000 

2.9660 

1 6547 

1 6441 

.8627 

8556 

dence interval per set 
of lOO samples 

4 

.9000; 

.8976 

.9328 

.9296 

4296, 

.4272 

Standard deviation of av- 

2 

.2121 

.2133 

.0676 

.0581 

.0305 

.0293 

erage length of confi¬ 
dence interval 

4 

.0300 

.0335 

.0268 

.0140 

.0088 

.0093 


3. The Experimenlal Data. We considered the rectangular population with 
0=1 and obtained the sample values by using pairs of digits obtained from 
Tippett's random sample tables [4], Using these observed values the confi¬ 
dence intervals given by (2), (4), (6), (8) and (9) were computed and the numbei 
of cases in which the value 0=1 was covered, noted. In all, 3900 samples 
“oTTwo were observed, subdivided into 39 sets of 100 each The samples of 
four were obtained by combining pairs of samples of two and there w'erc studied 
1500 samples of four, subdivided into 15 sets of 100 each. Table I gives the 
observed distribution of the number of cases of coverage per set of 100 samples 
of two and of four. The length of the confidence interval obtained by each of 
the three procedures was obtained and the observed mean and standard devia¬ 
tion of the distribution of the average length of the confidence interval per set 
of 100 samples computed. (Since they are averages of 100 values, these ob¬ 
servations are practically normally distributed.) Table II summarizes these 
results 
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THE NUMERICAL COMPUTATION OF THE PRODUCT OF CONJUGATE 
IMAGINARY GAMMA FUNCTIONS 


By a. C. Cohen, Jr. 

The difference equation 

(1) /»+! _ 4~ CiX ■+ Cj 

/» + Ca® + C4 

was used by Professor Harry C. Carver [1] as the basis for graduating frequency 
distributions in a manner analogous to the use of the differential equation 


1 dy _ a — X 
y dx bs + bix + bsx^ 

in the Pearson system of frequency curves. In order to determine a particular 
/iby Professor Carver’s method it was necessary to perform the complete gradua¬ 
tion from the lower limit of the range up to and including the required fx . 
When X is large and only isolated values of fx are required it seems desirable to 
have a method for computing fx directly, and the present note seeks to accom¬ 
plish this purpose. 

It is well known [2] that the difference equation 

( 2 ) .^1 _ (a: — ai)(x — as) {x — a„) 

(» - /3i)(a: - fit) {x - fim) 

has the solution 


(31 


/x 


^,r(a; - ai) ... r(* - a„) 
** r(a; - |3i) ... r(a; - 


where Wx is a periodic function of x (w, = Wx+n = ■ ■ ■ = k) and r(a: -1- 1) 
for a;, a positive real number may be defined in the usual manner by the second 
Euler integral 


(4) 


r(x + i) = f 


i‘e-‘dt 
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which obeys the fecursion formula 

(5) Tix + 1) = a:r(a:y 


When a: is a positive integer 


(6) r(a: + l) = a:!. 

Equation (1) is seen to be a special case of (2) for n = m = 2 and accordingly, 
the solution may be written as 


(7) 


U = K 


r(a: — ai)r{a; — as) 
T(x ~ ffi)r(x - /Ss) 




where «i and as are roots of x^ + Ci* d" cs = 0 and jSi and fit are roots of 
/ d- C 32 : + C 4 = 0. The following simple examples illustrate three special 
cases of this solution. 

I. All a’s and /3's are integers. 

-h Qa: + 20) 

/. + 6* + 6 

has the solution 


, _ „^ T(x + 4)r(y + 5) 


which, with the aid of recursion formula (5) can readily be verified by direct 
substitution. 

11. Either the o’s and/or the /3’s are real irrational numbers 

fx+i _ a;* + 6a: + 6 
/. a;2 + 3a: + 1 

has the solution 


j _ r(a; + 2)r(a: + 3) _ 

* T[x + K3 - V6)]r[a: + ^(3 + v^)] 


which, with the aid of the recursion formula ( 6 ) can also be verified by direct 
substitution. 

Ill, Either the a’s and/or the (S’s are complex. 

/gfi _ a;'* + 8a; + 17 
X* + 10a: + 29 

has the solution 


t - K r(^ + 4 + i)r(x + 4 — f) 
r(* + 6 + 2f)r(a: + 6 - 2 i) ‘ 

Since the recursion formula (5) is also valid for complex arguments [3], this 
solution can be verified by direct substitution just as in the first two cases. 
The evaluation of for a given x in cases I and II involves only computation 
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of quantities of the form r(a!) which c^n be apcomplisbed through the use of 
existing tables of Gamma Functions for-emall values olx and through applica¬ 
tion of Stirling’s formula for large values of x. Evpl'iation of in case III, 
however, involves the computation of quantities of the form r('u -f- f«)r(u — iv), 
a problem which seems to have escaped previous attention. The remainder of 
the present discussion will center about this quantity. 

The Gamma Function for a real positive argument has been defined by 
equation (4), but for the present purposes, it is more expedient to use the 
definition 


( 8 ) 


r( 2 !) = Lim 

n-*w 


_ nln* _ 

z{z 1) • • • (z -)- n) 


which is valid for all values of the complex argument z except at the poles 
(a = — 1; 2 = —2, etc.). The above definition is equivalent to (4) at all points 
where (4) is valid [3]. 

From equation (8), it immediately follows that r(u -f iv)T(u — iv) is a real 
number. In fact, we have 


r(u + m)r(u — w) =* Lim 


(n') n' 


2_?u 


[tt* -f- + 1)* -f- 1)“) • • • [(w + n)* -f t)*] ' 


We now develop a formula applicable in evaluating this quantity when u is a 
sufficiently small positive integer. As a consequence of equation (8) it can be 
shown that [3] 


r(2)r(l - a) = „-r- 


(9) .... 

sin TZ 

Let 2 = It) in the above equation and we immediately obtain the result 

2,rt)"‘ 


( 10 ) 


r(2V)r(-w) = - 


— e~” 


When^M is a positive integer, we may write 

(11) r(u -f ii>) = (u — 1 -|- w){u — 2 iv) ■ ■■ {iv)V{iv), 

(12) r(u — iv) = (u — 1 — iv)(u — 2 — iv) • • • {~iv)r{—iv). 

The piroduct of (11) by (12) gives 

r(u -(- fi;)r(w — iv) = -|- 1) ■ • • (v* -f- « — l*)r(tt))r(—tr) 

which upon substitution of the value found in Equation (10) for r(tti)r(—iV) 
becomes 


n (»’ + 

I»» _ e-" ' 


11-1 


(13) 


r(« -b iv)T(.u — iv) = 
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To obtain a result that is applicable when u is not a positive integer, we 
make use of Stirling’s formula for complex arguments. Lipschitz [4] proves 

Log r(s) = log + (2 — ^) log z 

. I / i\n V -Bam+i 1 

^ ^ ^o(2m + l)(2OT + 2)z^i 

and that the remainder after the mth term is 

„ - 1 r a- 

(2m + 3) (2m + 4) z»-"+» + “L 

where e < 1; e' < 1. Bgm+i designates the Bernoulli numbers. (Bi = 
Bi = -itiBi = it', etc.) We are thus able to write 

Log r(u + w) = log r(f2e'*’) 

= log + (fle’” - i)(log R + tV) 


(16) 


- Re** + E 


)TV “0 (2 m+l)(2m + 2) ’ 


where (p = tan“‘ %Ed = Vw’ + a’; 

V/ 


Log r(u - 1 «) = log V{Re ’*’) 

(jg) = log +■ {R^** — ^) (log R — tip) 


ttB -p 2^ /f>_. 1 i\/o„. I o\ * 


(2ni+l)»f 


m—0 (2 m+l)(2m+2 ) ' 

Adding (16) and (16), we obtain 

Log r(M + w)r(M - iv) = log 2r + (e’” + e~'*)R log - log « 

+ Rivie'* ~ e-'*) - RW* + e'**) 

I ( 1 ) Bim^i / (2m+l)>v> I —(2iii+l)i(S\ 1 

^ (2m + l)(2m + 2) ^ ^ R<‘’"+> 


which upon being simplified becomes 
Log r(M + iv)r(u — iv) 

= log 2 t 4- (2w — 1) log R — 2(ipa + u) + 2if'(R, ip), 

where 


(17) 


(18) 


^(R, p) 


V (-l)'"B2in+i 1 

(2m + l)(2m + 2) R?”'+^ 


cos (2m + l)p. 


This result is somewhat similar to that obtained by Karl Pearson [5] in con¬ 
nection with the evaluation of the G{„) integrals of his Type IV frequency 
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curve. If > 1, the expansion of <p) is asymptotic and the greatest 
numerical value that the with term can have is 


Bim+l 1 

(2m + l)(2m + 

Thus according to Lipschitz results, the error committed in dropping all terms 

after the mth will not exceed: l)"(2m + 2) following 

table gives an indication of the size of the error: 

Terms omitted Error committed in 

after <p) less than 

1st ±.0833 3333/E 

2nd ±.0027 7777/R^ 

3rd ±.0007 9365/E® 

4th ±.0005 9524/E’ 

5th ±.0008 4175/E®. 


It is now obvious that formula (18) will give satisfactory results whenever E 
IS sufficiently large. The degree of accuracy required together with the value 
of E will determine the number of terms of ^(E, (/>) to be computed. 

We now turn to the solution of the example under Case III and proceed to 
calculate /<, fu , and fm when R == 29. We may write 

IT _ 00^(5 + 2 i)r (6 - 2 i) 

r(4 + i)r(4-i) ■ 

Application of formula (13) gives 


r(6 + 2i)r(5 - 2i) = 244.043 648, 
r(4+ i)r(4 - z) = 27.202 292, 


from which, K = 260.171 676, 


f< 


= 260.171 676 


r(8 + z)r(8 - i) 
r(9 + 2i)r(9 - 2i) ■ 


Again making use of formula (13) we have 


f* 


= 260.171 676. 


22,243,314 

1,020,258,636 


5.6722, 


/i6 = 260.17T676 


r(l9 + i)r(19 - z) 
r (20 + 2 z)r (20 - 2 z) 


Since E is fairly large in this instance, formula (17) is used and all terms of 
HR, <p) after the first are dropped. This resqlt gives 


log r(19 + z)r(19 - z) = 31.5892 259, 


log r(20 + 2z)r(20 - 2i) = 34.0812 782. 
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Accordingly, log/u = 9.9232 071 -10 

and /i 5 = .8379, ' 

By the same method /im is calculated and we find/iso = .008723. 

As a check on the accuracy of the results obtained in the above computations, 
values of /j for x ranging from 1 to 15 were computed, using the given equation 
as a recursion formula. That is 

/. = §/. = 17, /. = = 11.05, etc. 

These results are given in the following table, and it is to be noted that the 
values in the table for /i and fu agree with those previously computed by use 
of formulas contained in this paper. For obvious reasons, no attempt was 
made to compute the value of /iso by this method. 


TABLE I 


X 

/* 

X 


X 

/(») 

0 

29.0000 

5 

4.3375 

10 

1.6228 

1 

17.0000 

6 

3.4200 

11 

1.3961 

2 

11.0500 

7 

2.7633 

12 

1.2135 

3 

7.7142 

8 

2.2779 

13 

1.0644 

4 

5.6722 

9 

1.9092 

14 

0.9411 





15 

0.8379 
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COMPARISON OF PEARSONIAN APPROXIMATIONS WITH EXACT 
SAMPLING DISTRIBUTIONS OF MEANS AND VARIANCES 
IN SAMPLES FROM POPULATIONS COMPOSED OF 
THE SUMS OF NORMAL POPULATIONS 

By G. a. Baker 

1. Introduction. Biological and sociological data are often “non-homoge- 
neous” and of such a nature as not to be easily separated into components. 
Non-homogeneous populations have been discussed by Karl Pearson, Charlier, 
and others. Non-normal material has been discussed by many writers See 
for example, A. E. R. Church [1] and J. M. LeRoux [2] for a discussion of 
moments of the distributions of the means and variances for samples from 
non-normal material. 

In a previous paper [3] the author has given the distributions of the means 
and standard deviations of samples from certain non-homogeneous populations. 
The purpose of the present paper is to extend the results given in [3] and to 
compare the moment approach of the Pearsonian school with the true distri¬ 
butions. 


2. Moments of the distribution of means of samples of n from a non-homo¬ 
geneous population. Consider a population with distribution 


( 2 . 1 ) 


fix) = 


r.- 




(1 + k)V^l 

The first four moments of (2.1) about a: = 0 are 

km 


a J 


( 2 . 2 ) 


/ 

Vl = 


I 

Vi = 


V3 = 


1 + k 
1 

1 -I- fc 

km 
1 + k 


[1 -f- H- m^)] 


-f- m^] 


/ 

Vi 


= [3 -I- kidc* -I- 6mV -H m% 


The means of samples of n drawn at random from (2.1) are distributed 
according to 


n 

if’ 

A fc* J 

1 -- fiYp ^ 


r 

\/2ir(l -f- fc)" 


OVsc^ + n-s j 

[ str^ + n- a \ 

' ^ 
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Denote by the moments of (2 3) about re = 0 and by nip the moments about 
the mean. Then in view of the relations 


nl s' 


r-l 

= Z^nT—- 


nl 


(w-s)Is' " (» - s)! (s - r + f)! 

Art = 1, A,^r-i) = 1, Aai = 3, Ail = 6, An = 7, 

Afii ‘ lOj An = 25, Asa “ 15, 

and similar relations, and reduction to moments about the mean we obtain 


/ km / 
mi = v-r-r = Vi 


1 + 1 : 


- »T(iW--' + rrH 

+(1 + hy + k) + 6(n - l^ka 


(2.5) 


+ rrifc -1)^ +1: 

^ (^^ ~ 4) fc + l)?n^J 


2 2 
m <r 


mi = 


k 


n*(l + fc)3 |_1S((2«' I)* + l]mcr* - 15 {fc + (2n - l)\m 

+ 30 (n — 1)(1 — k)ma^ 

+ rpfc {- (^ - Dfc + 4(n - 1)1: + 1 }toV 

+ 1 - A:“ — 4(n — l)fc + (^j _ i) 

(1 -j- fc)2 1~ *^+ (— 10a + ll)&“ + (lOn — 11)1;+ Ijm'J. 

ChmcVSTcTebycheff' 


( 2 . 6 ) 


The betas of (2.4) are 

7 2 2 
k m 


iBi = 


n(l + 1:) 


o 2 n , 1 fc 2 


1 + fc 


fco- + 1 + —- ^ 

1 + fc 
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(2.7) iBj-3 = 


tFoIO ^ „2 6 2 . 6 22 . fc '* — 4 :fc 1 4 ”! 

i.[3+3> -(!.> -f+t” + 1 + 1 ’"’ + “a + tr "J 


71 h(T 1 “I" 


k 2 


iBi vanishes if fc = 0, m = 0, or fc = 1 and cr = 1. If fc and cr are constant 
and m approaches infinity iBj approaches (1 — kf/nk. If k and m are constant 
and (T approaches infinity iBi approaches zero. 1 B 2 — 3 vanishes if A: = 0, 
k = 00 , or if m = 0 and <r = I, If k and <r are constant and m approaches 

TABLE I 


wis and pOTj compared for fOUT sets of values of k, and m 


Sets of values 
k cr® m 


1/2 1/4 1.1 


1/3 1 3 2 


4 599 1.228 

71 * n* 


89.702 39 322 

n* n* 





.096 .165\ 

®-T ) 

n 71 * / 



infinity then 1 B 2 — 3 approaches (k‘ — 4fc + l)/nk. If k and m are constant 
and (7 approaches infinity then 1 B 2 — 3 approaches 3/nk. 

It is of interest to compare the higher moments of (2.3) with the higher 
moments calculated from the first four moments on the assumption of a Pearson 
curve in place of (2.3). On this assumption 


pWs = 


27713 (rrii + Tminit — Smitnl) 
9771* — ^27714 + StwI 


It is seen that (2.8) bears little resemblance to ms. If we consider the 
difierence ^mt — mt we see that it Is of the same order in I/ti as is ttij and the 
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numerator is of the 16th degree in k, m, and <r ; a very complicated locus, mg and 
3 ,m 6 are compared for certain values of the parameters of ( 2 . 1 ) in Table I. 

Table I shows that the coefficients of in the expressions for vtf, and pWg 
differ by from two to more than 40 per cent. The coefficients of 1/n* differ 
even more. The assumption of Karl Pearson’s curves to represent the distri¬ 
bution of means of samples of n from non-homogeneous populations seems to 
be adequate in some cases but inadequate in others even for moderate values of 
the parameters. 


3. Moments of the distribution of variances. In [3] an esthnate of n times 
the standard deviation squared is expressed as 

(3.1) = (n — s) ffi "b Sff 2 -b ^^ (nii -b ffb)*, 

n 


where a bar over a letter means an estimate of the corresponding population 
parameter and where {n — s) denotes the number drawn from the first com¬ 
ponent of ( 2 . 1 ) and s denotes the number from the second component. 

For the direct calculation of the moments of the distribution of variances 
it is easier not to use the distribution given in [3], but to proceed as follows. Put 

(n — s) 1 = y, sal = z, — - — (Ml + m*)® = s. 


Of course, for population ( 2 . 1 ) cri = 1 , o-j = a, mi = 0, mi = m. The variables, 
y, z are all independent in the probability sense and their probability distri¬ 
butions are well known. Hence the moments of 

(3.2) ^ ^ + ^ + ^ 

n n 


can be directly calculated. 
For instance, if p = 1 then 


(3.3) 






h 

1-bfc 



In general, of course, the moments about the mean check with the values given 
by Church 

It is generally recommended to represent the distributions of vanances of 
samples from non-normal parents by Pearson’s curves. Let us examine the 
results of this procedure in a special case. 

Suppose that the sampled population is 


The first eight moments of (3.4) which are needed in the calculation of the first 
four moments of the variances are; 
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(3.6) 


v[ = 1.7000 vs = 0 

1)2 = 3.8900 vt = 294.47 

Da = 0 1)7 = 0 

Vi = 28.692 Da = 3,818.4. 



Fig 1. Comparison of the True Distribution op the Variances of Samples op 4 
Drawn prom the Non-Homogbnboub Population (3,4) with the Correspondino 

Empirical Pearson Curve 


The first four moments of the variances of samples of 4 from (3.4) are: 

, , 2M{ = 2.918 iMi = 4,745 

(3.6) 

zMi = 3,396 iMi = 41.52. 

Hence 2 B 1 = ,60 and 2 B 2 = 3.6, k = —.87 which calls for a type 1 curve. The 
equation of the curye is 

2,191 / \ 16 84 
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with its origin at its mode The corresponding true distribution with the 
origin at the beginning of the range is 

wz = e~'*[.3989v^ + .003550 sinh (3.4v^) 

(3.8) 

+ .0005454 sinh (6.8 Va:)]. 

Distribution (3.8) differs slightly from the corresponding result given in [3] 
because of an error in that paper. 

The two distributions are compared in Figure 1. It is seen that the two 
distributions are quite different. As the number of components of distributions 
similar to (3,8) increases, which is true as n increases, the distributions may 
be expected to become smoother and more closelv representable by a singk 
smooth curve. 

4. Summary. The moments of the distribution of the means of samples of n 
from a non-homogeneous population composed of two normal components are 
given up to and including the fifth. This fifth moment is compared with the 
fifth moment calculated on the assumption of Pearson's curves to represent 
the distribution of means. The B's of the distributions of the means are dis¬ 
cussed in certain limiting cases. It appears that for small samples and extreme 
values of the parameters, and in some 'cases of moderate values of the paramer 
ters, the Pearsonian approximations give poor results. 

Some identities involving the binomial coefficients are given which permit 
the reduction of the moments of the distribution of means calculated directly 
to forms given elsewhere [1], A method is given for the direct calculation of 
the moments of the variances of samples from a non-homogeneous population 
composed of two normal components. An indication of the closeness with 
which a Pearson curve can be made to fit the distribution of variances in small 
samples from a non-homogeneous population is given in Figure 1. 
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A LEAST SQUARES ACCUMULATION THEOREM 

By W. E. Blbick 


The following simple least squares theorem does not seem to have been men¬ 
tioned in the literature, and has at least one practical application. 

If 4 *( 2 ) and B*{x) are polynomials of the same degree which are least squares 
representations of the functions A(x) and B{x) respectively, for the values 
* 1 , Xj, a;j, • • ■ I iXp , then 

(1) S A*(a:<)-B(ii) = Aixt)B*{xt) = A*{xt)B*(xt). 

i-i j-i 1-1 

To prove the theorem let 

(2) A*(x) 


and 

(3) 


B*(x) = £ bix’. 

i-o 


Then the normal equations for the determination of 0 , and bj are 

in p 

(4) £ an8(+it = £ z’lA(x,), k = 0,1,2, “ 

<-0 {>-1 

and 

(6) i: = i xiB{xd, 


/■-o 


h — 0, 1, 2, '' 


I m. 


• I"» 


where 


Sr = ^ Xj . Hence, 


by (2) and (6) 


£ A*{xt)B{xi) = £ fS OiXllsCx,) 
«-i 1-1 L<-» J 


( 6 ) 


= £ Bixt) 

4-0 (-1 

in n 

= £ £ cnbiSf+, if n S TO, 
<-0 ;-0 


= £ A*{xt)B*‘{xt) if m. 

Similarly it can be shown that 

(7) i Aixt)B*ixt) = ^ A*{xi)B*{xt) if m'^n. 
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Combining (6) and (7) we have 

(8) 2 A*{xi)B{xt) ~ X] A{xt)B*{xt) = S A*{xt)B*{xt) if m = n. 

(-1 (-1 <-i 

In the particular case A (x) = B{x), equation (8) gives the interesting result 

(9) A*{xt)[Aixi) - 4*(a;i)] = 0. 

t-i 


An obvious extension of equation (6) is 
(10) '£i XiA*ixi)B(xi) = x‘'tA*ixt)B*(xi), if n A- q, 

<-i (-1 


where g is a positive integer. 

A practical application of (8) has been made by one large insurance com¬ 
pany in the case fn = n = 1. Suppose that A (x) represents an annual payment 
made x years ago and is an approximately linear function, and that B(x) repre¬ 
sents a compound interest function. Then, even if B{x) is not a linear function, 
we may write approximately 

t Aix)B(x) s i Aix)B*(x) 


( 11 ) 


= A{x){bo 4- hx) 

X*-I, 

A(x) + hi £ xA(x) 


A(x) + hi E E A(y). 

X^l x^} pmx 

Thus if a year-by-year record is kept of the annual payments Aix), the sum 

^ V r 

2^ A{x), and the double sqm 22 E A{y), and if hg and hi are tabulated func- 

*■!, jfmx 

tions of p, equation (11) affords a convenient method of evaluating ^ A(x)B(x) 
approximately. 

The author wishes to acknowledge that the case m = n = 1 of equation (8) 
and the above application were brought to his attention by John K. Dyer. 


Cooper Union, New York, N. Y. 


















PARABOLIC TEST FOR LINKAGE 
By N. L, Johnson 

1. Introduction. In this paper a problem in testing statistical hypotheses 
which has applications in genetics will be treated from the standpoint of the 
Neyman-Pearson approach. This approach has been developed in a scries of 
papers, [4], [5], [6], [7], [8], [9], [10], to which the reader is referred for definitions 
of the concepts of a simple statistical hypothesis, critical regions, power function 
of a test with respect to alternative hypotheses, and that of a test unbiased in 
the limit employed in the present paper, 

2. Statement of Problem. We shall consider M independent experiments, 
which will each yield results falling into one of the four categories described by 
the possible combinations of the 4 events a, not-a (or d), h, and not-b (or 5) 
as set up in the following table. 



. • 

a 

not-a 


h' 

Pi 

Pi 

Pi 

not-6 

P3 

Pi 

1-Fi 


Pi 

1-Pi 

1 


We shall assume that the marginal probabilities are known and have values 
Pi, I — Pi, Pi, I — Pi as shown in the table. Thus Pi = probability of 
event h happening whether event a occurs or not. It is obvious that if, further, 
the probability of a result falling in any one category or cell is fixed, then the 
other three cell probabilities will also .be fixed. For if pi, Pi, ps, pi be the 
four cell probabilities as shown in the table above, we must have 

(1) Pi + P2 = Fi; Pi + P3 = Fj; P2 + Pi = I - Pi. 

Hence the values of the cell probabilities will be determined by a single parameter 
B, say, as follows 

Pi = FiF/ p, = Pi(l - P,e*) 

Pa = Fj(l - Fie') p, = 1 - Fi - Pj + PiPie\ 

The range of values which B may take for the set of admissible hypotheses is 
found from the conditions 
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(3) Q < Pi < I (i = 1, 2, 3, 4) 

to be 

(4) - CO < 9 < min (-log Pi, -log Pi) if Pi + P 2 < 1 
but 

(6) log (Pr^ + Pr^ - Pr^Pa"^) < d < min (-log Pi , -log Ps) if Pi + Pj > 1. 

The hypothesis tested, Ha , is that d = 0, i.e. that the events a and h are 
independent. It will be noticed that Hq is a simple hypothesis, since it specifies 
the probability law of the observed variables completely. In fact, if m, be 
the number of results out of our M experiments which are in the fth category, 
then nil, rrii, nia, mi are our observed variables, and we have 


(6) P{mi = m'i, mi = m's, mi — mi | Ha) = 


M\ poi' P '02 PoV Ph 
mil mil vtil mil 


where pat is the value of pi when 0 = 0. 

This is the conceptual model used in testing for linkage in two pairs of genes; 
Ha corresponds to the hypothesis “there is no linkage.” Fuller explanations 
are given by Fisher [3]. It should be noted, however, that Fisher uses a pa¬ 
rameter S corresponding to ie' ill this paper. 


3. Basis of Selection of Test. The question now arises; what test shall we 
choose for the hypothesis Ho? That is, what should the critical region w be 
to give us results as satisfactory as possible? The main aim must be to avoid 
errors, both of first and second kind, as far as possible. The first kind of error 
is subject to control, since the probability of the sample point E falling in w 
when Ho is true (which we shall denote hy P{E ew\Ho]) can be determined 
approximately. Ho being simple. The critical region w is therefore chosen, if 
possible, to give a definite level of significance to the test associated with it. 
However, there will usually be many regions which will do this, and in 
order to decide which of them give more satisfactory results we consider 
(1 — P{E tw\ H}); i.e. the probability of the second kind of error with respect 
to an alternative hypothesis H, the first kind of error being fixed. 

In the present case H will be determined by B and so we may put 
P{E ew\H} = p{w 1 6), where p{w 1 B), considered as a function of B, will be 
the power function of the test associated with the critical region w. We want 
w to be such that fi{w | 0) = «. a being the fixed level of significance while 
P{w\ 6) is as large as possible. 

It is also desirable that we should accept the hypothesis Ho more often when 
it is true than when any one of the alternative hypotheses (H) is true. Ex- 
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pressed symbolically, this means that 

(7) 1 0) < p{w I $) for all d 0. 

Any test satisfying the last condition is said to be unbiased. 

If (3 and — are each continuous and differentiable functions of 6, and we 
ad 

consider only those alternative hypotheses specified by suitably small values 
of 6, sufficient conditions for the test to be unbiased will be 

® 'A. = 

According to the terminology recently adopted by Daly [1], the tests of 
which it IS known only that they satisfy (8) and (9), arc called locally unbiased. 
If a region w could be found such that, v being any other region for which 

(10) 0(w I 0) = ^(v I 0), then /J(w | d) > i3(v | 0) 

for all 6 0, this would give a test which would be the best with respect to any 

alternative hypothesis. However, it has been shown by Neyman [4] that under 
certain conditions, which many probability laws satisfy, such a test will not 
exist An attempt is therefore made to control the power of the test with 
respect to hypotheses specifying values of 6 near to 0; hoping that the powers 
of the tests so obtained with respect to the other hypotheses will behave in a 
satisfactory manner. Thus Neyman and Pearson [9] define an “unbiased test 
of Type A" as a test corresponding to a critical region w such that if v be any 
other region in the sample space W for which 

(11) 0(w ( 0) = ^(» I 0) = a 

and 


( 12 ) 

then 

(13) 


a/3(w I g) *] ^ 8i3(t> I 9) 1 

dO _i-o 99 _ 9-0 


d"§{w I 9) ~| ^ s‘p{v\ey 

09 * _ 09 * .»-« 


In the problem which I am treating the conditions 

(14) = o 


implied by (11) and (12) above cannot, in general, be satisfied, since the distribu¬ 
tion is discontinuous, i.e. PIE ew \ Ho] is a discontinuous function of w and, in 
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fact, for a given sample size, has only a finite number of possible values, none 
of which need be equal to a. 

However, it may be possible to find a test of of a typo called “unbiased 
in the limit (as M increases),” based on the limiting form of the multinomial 
distribution which is a continuous function of w. The definition [6] of a test 
“unbiased in the limit” will be taken as follows" 

Suppose we have a sequence (wu) of critical regions, Wm corresponding to a 
sample of size M, such that 
(i) for any M, if Vjt be any region for which 


(ii) 

(18) 


(15) 

1 

and 


(16) 

d^{wM 1 6) 

then 


(17) 

a'/3(Mji, 1 fl)‘ 


_ d${VM I 0) 1 

de J, 

I ey 


> 




lim fiiwM 10) = a. 


(m) if 

(19) 

( 20 ) 


d = y^{6 - 0 ) = VMe 


hm = 0 

jif-ioo dd Jtf~o 


then the test associated with this sequence of critical regions is unbiased in the 
limit, 1 shall call such a test a test of type A„ . 

The reason for using d as the variable in condition (19) above is that, unless 
our sequence of critical regions has been very badly or unluckily chosen, we 
shall have 


(21) lim /3 (u)m I 5) = 1 {6 ^ Q) 

while, by (18), lim ^{wm | 0) = a and so, in general, lim ^ — will not 

W-»oo M~*oo Ov 

exist at 5 = 0. Hence we introduce termed the normalized error, and, keeping 
d constant (and hence making d tend to zero) we form lim ^^(^^ I _ 

M-*io dd 

In the next section will be obtained a test of Ho which is of type A„ . 


4. Derivation of Test. The composition of a sample of M experiments is 
uniquely determined by the numbers of results mi, ms, wia falling in the 1st, 
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2nd and 3rd categories respectively. Thus any sample may be represented by a 
point E{m) in a three-dimensional sample space W{m) with coordinate axes of 
mi, m 2 , and m 3 . It will occasionally be convenient to represent the sample 
by a point in a three-dimensional space with other axes. The following sample 
spaces will be used. 



Wim)- 

■space with coordinate axes of mi, 

, mj, 

mi 




W{d)— 

it a it a it 

da , 

di 





W{x)— 

a n t( It tt ^ 

Xi , 

Xi , 

Xs 





Win)— 

it it a it ti ^ 

rii , 

fii , 

ni 




where 








(22) 


II 

! 

# 



(i = 

1. 2, 

3, 4) 

(23) 


X, = (m, - Mpo.)/(Afpo.)* 



ii = 

1,2, 

3,4) 

(24) 


n, = mifM 



ii = 

1,2, 

3,4). 


I shall use w m indifferently to denote “the critical region corresponding to 
sample size M” in any of the four sample spaces above; E indifferently to 
denote corresponding positions of the sample point in any of the four sample 
spaces' except in cases where confusion might arise, where I shall use 
Wjuid), Wuia:), Wu{n) and E{m), E{d), E{x), E{n) When necessary the size of 
sample with which a point E is associated will be denoted by a subscript; e.g. E m ■ 
In finding a test of type we shall need to consider the quantities 


9ty 36 ^ 




, where ^ = 6 -s/M. 


The probability law of the observed values mi, m 2 , m 3 is discontinuous with 
respect to the points of the sample space W m ■ For if .B® be a point which 
corresponds to integral values mj, m?, m? of m-i, m 2 , m 3 ; subject to the re¬ 
strictions 


(25) 

(26) 


0 < mj {i = 1, 2, 3) 

Q < M 

1-1 


then 

(27) 


F[Em = Ef^\0 = 0 } 


mjl mil mjl mjl 


where 

£ m< = A/ 
1-1 


(28) 
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Pol = PlPt P 02 — -PlCl ~ Pi) 

(29) 

Poo = P2(l - Pi) P04 = (1 - Pl)(l - Po) 
while if Bf be not such a point 

(30) P{Eu = lf‘\9}=0 
whatever the value of 8 may be. Now 

(31) 

u>v OTi'»ij! ma! mi! 

where Pi, pa, Pa, Pi are as defined in (2) above, and 52 denotes a finite sum- 

mation over all points E' mwju for which P{P ai = P' | 6} 0. Differentiating 

each side of (31) with respect to 6, we get 


dd _ 9^ ID A/ 


Ml pti^poipZ‘pZ^ 

mil fflalmal will 


fmi(l — Pi — Pa) — mjPs — maPi -f OTiPiPa 

L (l-POd-Pa) . 




*j/1 fl) ~[ _ y 

J»-o 


Ml 

wilmalmslwi! 


(33) • (!- ~ Pi)i(r-P, y» - Pi- Ps) - w»2P* - «^3Pi + MPiPa}^ 

- (miPiPj(l - Pi - Pa) -h wiaPad - Pi - PiPa) 

+ WaPid - Pa - PiPa) - MPiPa(l - Pi)(l - Pa))]. 

Theorem 1 . The sequence of critical regions [wm) defined hy 

(34) V Bu^ > A in Wm ] t> -f Bifi < A elsewhere, 


_ ^i(PiPa)*d - Pi - Pa) - ®aP}d “ Pa)*Pa “ ^3Pj(l - Pi)*Pi 
^ {PiPa(l-Pi)(l-Pa))* 

Pid - Pi)(2Pa - l){a:i(PiPa)* + laPJd - Pi)*) 

_ +Pa(l-P*)(2Pi-l){a;i(PiP2)‘-ha:2P}(l-Pa)*) 

[PiPad - Pi)d - Pa) {Pid-Pi)d - 2Pa)»-b Pad - Pa)d - 2Pi)»)P 

/oyi R ^ r MPiPad - Pi)(l - Pa) _1* 

LPid - Pi) (1 - 2?*)*+ ?*(!-Pa) (1 - 2Pi)^J 
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+» 


(38) 


If 

27r *Lgo 




f e ^’^dirfdu = a 

JA-Bui I 


avd Xi = (Mp )* defined above, is associated vnth a test of the hypothesis 

Hold = 0) which is unbiased in the limit, of type at level of significance a, 
provided that 


(39) 


0 < Pi <1 


and Pi and P 2 not both equal to 
In Lemma 1 of the Appendix (paragraph 9), put s = 2, and let 


fi = individual members of the summation for ^(wa 10) 

t, t, 9i3(wAf|0) 1 

de J 


fi - 


it it 


M 


/a ~ 


d^fijwM I d) 


36^ 


_ 


(*■ = 1 , 2) 

(see (31)) 
(see (32)) 

(see (33)). 


(40) 


From Lemma 1 we see that the regions (w) defined by 
/a > Oi/i + <hfi in w 
U ^ + “ 2/2 elsewhere 

will maximize 2 fo with respect to all regions for which 2 fi and 2 /a are fixed. 

to V) W 

(fli and 02 are arbitrary constants depending on the fixed values of 2 fi and 

w 

X/s). Hence any sequence of critical regions (waj) defined by 


(41) 


(mi(l — Pi — P2) — mzPa — msPi + MP1P2Y 

- {miPiPj(l - Pi ~ P2) + irnPiil - Pi- P1P2) 

+ m,Pi(l -P2- P1P2) - MPiPjd - Pi)(l - Pj)) 

> Oi{mi(l — Pi — Pj) — TThPi — msPi + MP1P2] + o* 

in Wjtf, will satisfy conditions {i) given above in the definition of a test of 
type A„ . The inequality (41) may be rewritten 

{mi(l — Pi — P2) — m 2 Ps — WjPi + MP1P2 — fla)* 

(42) - [P2(l - Pi) K - MPi{l - P2)) 

+ Pi(l - P2){m3 - MP2(1 - Pi)}] > 0* 


the a<'s being arbitrary constants. 

Also, by Theohem I of the Appendix, we have that, for any given e > 0 
and any region w, there is a number M, independent of w and such that for all 
M > M., 
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(43) 

1 0{w 1 0) — /(w) 1 < e 

where 


(44) 

«(*) 

and 


(45) 

3 

Xo = 2 2:?(1 + PoiPo/) + 2 S x,Xj{p(i,poi)^Pal. 


t -1 i<}S3 


We will now apply a transformation to the coordinates mi, rrii, mi which will 
(a) transform inequality (42) into a simpler form, 

(&) transform I{w) into a form to which the tables of the Normal Probability 
Integral may easily be applied for purposes of calculation. 

This transformation is 


,,,, xiiPiPi)Kl -Pi- P 2 ) - - P^?P2 - XiPiil - Pi)^Pi 

“ =-|p.p,(i-p,)(i-p.)ii- 


(47) v = 


(48) f = 


Pi(l - Pi)(2P, - DlxiiPiP^)^ + x,Pj(l - Pi)*) 

+ Pj(l - P2)(2 Pi - l)(a;i(PiP 2 )* + XiP\{l ~ P 2 )*) 
[P,P2(1 - Pi)(l - P 2 ) {Pi(l - Pi)(l - 2P2)^ + P 2 (l ~ P 2 )(l - 2Pi)^}]i 

(2Pi - l)(*i(PiP,)* + XiHH - Pi)*} 

- (2P, - l)(*l(PlP 2 )* + *2P}(1 - P 2 )*} 
{Pi(l - P,)(l - 2Piy + P2(l - P2)(l - 2Piy]i 


This is a proper transformation, since imder the conditions of the theorem 
0 < P, < 1 and Pi and Pi are not both and the Jacobian 


(49) 


j _ d(u, V , t) _ I 

d(xi,X 2 ,Xa) 


is non-zero and of constant sign. 
Also 


(50) xo = u^ + v^ + f. 
Hence 

(51) 


The inequality (42) is transformed into an inequality of form P(m — as)* v>A 
where B has the value stated above; as and A being at present arbitrary 
constants. 

Therefore we may put as = 0 and define A by the equation 


2ir 




du = 


a 


(52) 
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and conclude that the seCiiience of critical regions (Wm) defined by the in¬ 
equalities 


(53) 


Bu^ + « > ri in ic Af 
Bu” + u < 4 elsewhere 


will satisfy conditions {i) for a test of type A „ . 
From (51) and (52) 






.1 rL-.-r 

2ir 1 Ja-bu^ 


(54) 


By Theorem 1 of the appendix, as mentioned above, we have 
(55) I Kwm I 0) — I(wm) I < e for all M > M, 

i.e. 


(56) 1 ^(w Af 1 0) — a 1 < e for all M > Mt 
and so 

(57) /3(ieAf I 0) —>• a as M—»o®. 


Thus the sequence of critical regions (wm) satisfies the condition (^^) of the 
definition of a test of type A„ . 

If w be any region defined by inequalities on u and v only (as are the regions 
Wm) then, as a special case of Theorem 1 of the Appendix, we have that for 
any e > 0 there exists a number Mt such that for all M > J14, 


(58) 


Pm{w) - ^ If du dv 

\0{UtV) 


< f 


where Pm{w) = e u; j 0}. 

By (31) and (32), noting that — = VM • , we have 

du Otr 


= r fiiw, v) • (PiPj)‘(l - Pi)“Hl - Pj)'^ 

(■gg^ on Jtf_0 v: 

= Hfi{u,v)-uk 

W 

where k = {P^Pifil - Pi)"*(l - P*)”* > 0. 

By Theorem 1 of the Appendix, as last stated above, we have 


(60) 


f,(u. v) = 1 ARAt;.e-‘'“’+'’'(l + Rm) 

Air 
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where for convenience we have written Am, At) for A(j/) 'U, A(M)t) the units of u 
and V when sample size is M, and Rn for v) which has the property that 

(61) £ Rm{u, !;)A(M)M.A(jif) —> 0 


uniformly with respect to u) as Af «). 

Now let M)'*' denote that part of w where Ejif > 0 and wT that part of w where 
Rt, < 0. Then 


(62) 


2 fcw/l(w, v) 



AuAt) , V 

—— .Me + Zj ^ — uKmb 

25r w+ 2ir 


Let 

(63) 


)Si = E 

IU + 


AuAv 


uRtre 


-i{u*+»2) 


= k 







By Schwarz's inequality 


(64) 


St 

< 

k 



5 


AuAv 


ni^ J? 

U Km^ 


AuAv p — 


i 


But 

(65) 5 u‘Mu, •) - 5 ^ + 5 

Now u%(u, e) > 0 and 2] u^fiiu, v) is finite (since is a homogeneous function 
w 

of second degree in the a:,’a and so has a finite expectation) and is bounded 
as M —^ oo. Hence 22 ’^h^} *') is finite and bounded as M —> <». Further, 

WT 

as M —> <w 

(66) 22 ^ ff dM dv. 

^ 27r ^ 

ifl+ 

Hence 22 is bounded as ikf <». From this result, 

Ztt 

together with (61) and (64) it follows that Sm —> 0 as M uniformly with 
respect to w. Putting 

(67) 


it will follow in a similar manner that (STii 0 as ilf —> <» uniformly with 
respect to w. Hence 


( 68 ) 


OV Jtf-0 VI 


U 27r 


where Su = Su + )Si and so/Sj^-^OasM—» « uniformly with respect to w. 
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Hence whatever be e > 0, there is a number M' such that for all M > M[ 


a/3(u) I d-) 

0t? 


-Iff 

Ja-o 2^ J J 


ue du dv < e 


whatever be the region w. In particular we may take w = Wm , and then 
we have 

(70) A // ^ e"*”’ ^^4 du = 0 


and so 


i>M I ty) ~j 

9i> Ja-o 


for all M>M'. 


i.e., 

(72) to . 0. 

Hence the sequence of critical regions {w u) satisfies condition (iii) for a test 
of type . This completes the proof of Theohem 1 
In the above theorem we have found a test which is unbiased in the limit for 
all cases except that for which Pi = Pi = The following theorem derives 
the test appropriate to this special case, and it is found that in this instance the 
test takes a very simple form. 


Theorem 2. If Pi = P 2 = 2 , the sequence of critical regions {wm) defined hy 


(73) 

1 a:2 + Xs 1 >0 

in Wm 

1 X 2 + Xs 1 <0 

elsewhere 

where 



(74) 

vr.L‘ 


(75) 

m, — 

(I 

CO 


IS associated with a test of the hypothesis Ho{d = 0) of type A„ at level of 
significance a. 

The proof of this theorem follows the same lines as that of Theorem 1 as far 
as inequality (42). On putting Pi =. P 2 = ^ in (42) we get 

(— — Os)* — i(m2 + ma — ^M) > 04 


(76) 
ie., 

(77) 


(xi + X3 — ««)* > 07 . 
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The critical region Wm defined in the statement of the theorem is of this 
form with oe’ = 0 and 07 = a*. 

Hence the sequence of critical regions (wx) satisfies conditions (i) of the 
definition of a test of type A „ . The sequence of critical regions may also be 
shown to satisfy conditions {n) and {in) for a test of type by following the 

lines of the proof of Theorem 1 and noting that la + X 3 = (m^ + mg — jM) 

tends to be distributed as a unit normal deviate as M -4 00 
On account of the shape of the critical regions in the general case, I shall for 
the remainder of this paper call the tests derived in the above theorem the 
‘parabolic tests for the cases considered. 


6. Application of the Parabolic Tests. For practical purposes the formulae 
derived above are inconvenient to use. I will therefore express them in terms 
of the deviations of the observed frequencies in the four cells from the frequen¬ 
cies “expected” when the hypothesis = 0 ) is true, i.e. in terms of the 
variables di, where 

(78) d, = m,~ Mpo, = x,(Afpo.)^ (t = 1, 2,3,4), 

The test then becomes “reject the hypothesis Ho at level of significance a if 
V + > A" where 


(79) w = 

(80) D = 

(81) 

(82) 


di(l — Pi — Fa) — daPa — djPi 
{MPM ->i)(l “ Fa)}*'" 

Fi(l - F0(2Fa - l)(di + do) + Fad - Fa)(2Fi - l){di + da) 


[MFiFad -Fi)(l -Fa){Fi(l~Fx)(2Fa-l)'-bF3(l-F3)(2Fj-l)*)]» 


B.r_ m 

LFi(1 - P0(1 - 


MFiFad - Fi)(l - Fs) 


2Fa)* + Fad - Fa)(l 


- i 


except when Fi = Fa = 5 , In the latter case reject the hypothesis Ho if 


(83) 

where 


da "h da 

jMi 


> a 


(84) 


1 /•+“ 
— 7 =- f 
\/2ir J-a 


dx 


1 


a. 


The application of this last case (Fi = Fa = ^) is straightforward, a may be 
found from the tables of the Normal Probability Integral, da and do may be 
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calculated from the data, and we may then see whether the inequality (83) is 
satisfied, and so assess our judgment of the hypothesis Ht,. 

TABLE I 

Significance of Symbols 


A and B are connected by the following relation. 


Tri. 

Table la 

Table Ib 

a = 

0.05 

a = ' 

0.01 

poi = A — 

3 8414588 B 

Pm = a — I 

S.6348966 B 

B 

P.os 

B 

P.Ol 


1.6449 

0 

2.3263 



1.00 

0.289 

1.26 


1.25 

.231 


.212 

1.50 

.192 

1.75 

.181 

1 75 

.165 


.158 

2.00 

.144 

2.26 

.141 

2.25 

.128 


.127 

2.50 

.115 

2.75 

.116 

2.75 

.106 



3.00 

.096 

3.25 


3.25 

.089 



3.50 

.082 

3.75 


3.75 

.077 



4.00 

.072 

5 


5 

.058 

6 


6 

.048 

7 


7 

.041 

8 


8 

.036 

9 


9 

.032 



10 

.029 

15 


15 

.020 



20 

.014 



30 

.009 



40 

.007 

50 


50 

.006 


The general case is also straightforward, except for the determination of A 
from equation (81). To facilitate this I have constructed Tables la and Ib. 
These tables correspond respectively to significance levels 05, .01, and from 
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them the value of A corresponding to a given value of B may be calculated. 
The quantity tabled, (p), is the difierence between A and a multiple^ (constant 
for a given level of significance and given with the table to which it applies) of 
B. To find A, therefore, B is calculated, multiplied by the appropriate con¬ 
stant, and added to the quantity in the table corresponding to B. For large 
values of B (40 and over) p is small, and A may be taken equal to the constant 
multiple of B 

In particular cases when the values of Pi and Pa are substituted in the expres¬ 
sion for B (see Theorem 1 above) and in (79) and (80) above, these equations 
appear much less formidable. Thus in the case considered by R. A. Fisher 
[3], Pi = P 2 = j and we get 



u = - di- di); V = - iiQMpi2di -f- dj + ds) 

and the test becomes “reject the hypothesis Ha at level of significance a when 


(86) 

(/) = ( (2di — da — ds)^ 

— |■(2d^ + da -|- 

da)}/[mMf] >A 

where 




(87) 

rfh’ 

f dv 

}■ du = a. 

27r J—ao 


1 


Example, Fisher [3] gives an example of the case Pi = P 2 - In the 
series of experiments that he quotes the observed results fall in the four cate¬ 
gories respectively as follows: 

mi = 32; ma = 904; m 3 = 906; mi = 1997. M = 3839. 

Hence di = -207.9375; dj -f d, = 370.375. From ( 86 ), 0 = 10863.1. B = 
37.94239. From the tables: 

at .05 level. A.oi = 3 8414588 X 37.94239 + 0.0075 = 145.7615 

at .01 level, A 01 = 6.6348966 X 37.94239 + 0.0065 = 251.750. 

Hence we reject the hypothesis that 6 = 0, i.e. that there is no linkage, since 
the value of if) is well outside even the .01 level of significance. 

6. Power function of the Tests. General Case. The parabolic test as de¬ 
scribed above has the desirable property that of all tests (at level of significnace 
a) which are unbiased for large values of M this test will detect small variations 
in 6 most frequently. However, to get a clearer idea of the properties of this 

‘ This multiple is equal to ICa where . — ; e~*‘’ dt = 1 — a, a being the level of 

V2ir J-h, 


significance. 
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test we shall calculate, as accurately as may be practicable, the power function 
of the test. 

As a preliminary step we obtain a rough idea of the power function by making 
use of the concept of a limiting power function as stated by Neyman [6]. This 
may be defined as follows: 

Let E M' denote the sample point corresponding to a sample of size M', and put 

( 88 ) P{E^, ew\S'} = 

where ■&' = M'^d, w being a fixed region. Supposing i}' kept fixed, let M' increase 
and let 

(89) d-Cto 1 d') — lim ^M'iw I ti') 

Af 

if this limit exists. 

Then /9«(w | d') is the limiting power function of the test associated with the critical 
region w. It will be noted that the limiting power function is a function of d'. 

In the problem under consideration the parabolic test when the sample size 
is M is associated with the critical region Wu. Now it should be noted that 
in the definition of the limiting power function w remains fixed. Therefore 
the limiting power function of the parabolic test for sample size M is 

(90) I &') = lim fiu'iwjtr 1 1 >). 

The significance of the limiting power function is that for any t > 0 and for 
any d' there is a number Me,» such that for all M > we have in our case 
(by Theorem 1 of the Appendix) 

(91) 1 ^m(wm I d') - U') I < «• 

It should be noted, however, that the limiting power curve (the graph of the 
limiting power function against Q = dM~^) may be only a very rough approxi¬ 
mation to the actual power curve. Furthermore (Neyman, [6, p. 83]) we can¬ 
not, in general, use the limiting power function of a test to answer the question: 

“How large must we take our sample size M to detect the falsehood of the 
hypothesis HoiB — 0) when actually 6 = 6', with a limiting probability of at 
least, say, 0.95?” 

For if we form a table as below 

M d[u) = M^d' 1 d^M)) 

100 

1000 

it is possible that /9 „(m) | d( m)) may never attain the value 0.95. 
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Theorem: 3 The limiting power function of the parabolic test is 

(92) I,?) = r r -w a 

ZlT i/—oo JA—Bu^ j 

in all cases for which 0 < Pi < 1 and Pi and are not both equal to §. 

The proof of this theorem, follows immediately from Theorem 1 of the Ap¬ 
pendix by applying the transformation (46)-(48) and putting \ = P 1 P 2 
The above remarks concerning special precautions to be taken with respect 
to the limiting power function suggest the necessity of studying the actual 
power function of the parabolic test by some other method 
With this object in view, a study was made of the distribution of the function 
1 ^ = w + Bu^ for finite values of M and in particular for M ~ 100 and M = 3839, 
0 is a discontinuous variate and, for any given value of M, has definite limits 
of variation arising from the limitations on the values of the variables m, stated 
in the inequalities (25), (26) above These limits of variation of 4> were found 
to be 

(93) - - xV) < 4 >< - 1) 


for the case Pi = Pj = Hence when 

M = 100, -12.25 <4> < 5486 86, 

M = 3839, -75.89 < < 1310795.76. 

Also it was found that 


(94) (§(</) I 0) = B' 


1 + 


(1 - 2Pi)(l - 2P0 ,, , (M - DPiPa 

(1 - Pi) (1 - P.) (1 - Pi) (1 - Pi) 


(e’-l)j 


where S(<f)) 0) denotes the expected value of </>, given the value of the parameter 
0 Thus when Pi = Pa = i we have B = \/|-M and so &(cf) \ 0) = 

Hence when 


M = 100, 6(<f. 1 0) = 6.12372, 

M = 3839, &(<i> 1 0) = 37.94239. 


It is thus seen that the distribution of <j) might be represented by a Type HI 
curve, since the distribution of <f> has a finite lower bound and a very long 
positive tail. In order to fit a Type III curve, we must know the second moment 
of the curve as well as its lower bound and mean. The general expression for 
the second moment about zero is too complicated to be printed and so only the 
numerical expressions obtained by giving special values to M are given below. 
These are: 


{i) Af = 100 

1 6) = 112 41667 + 165 62963(e’ - 1) + 2493.33333(6" - 1)" 

+ 1078.00000(e" - 1)" + 4356.91667(e" - 1)', 


(96) 
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(it) M = 3839 

&(4>^ I e) = 4318.79213 + 6397.29625(e^ - 1) + 3684321.24073(e* - 1)= 
+ 1636267.33265(6* - 1)= + 261530062.11111(e* - 1)\ 

Using the above results Type III curves were fitted to the distribution of 4>, 
and approximate values of the power functions ^(id^ ] 5), at level of significance 
05, were calculated. This was obtained by evaluating P(4) > A os 1 0j and 
assuming the distribution of <j> to be that given by the fitted curve. Then 

(97) I (?) = > A OB I el. 

The values obtained for the limiting and approximate power functions are 
given in Tables Ila, lib. Unfortunately the agreement between the two is 
not satisfactory. 

Sfecial Case. For the cases Pj = = i (Jlf = 100, M = 400) power 

functions were calculated on the assumption that for a given value of d, the 
random variable 2M~\d<i + ds) is distributed normally about a mean M\e — 1) 
with standard deviation -\/e\2 — e"). This is approximately the case for the 
values of M considered. The approximate power functions so calculated are 
given in Tables Ilia, Illb 


7. Parabolic Test and Test. It is interesting to note the close connection 
between the parabolic test and the x* test as introduced for intuitive reasons 
and normally used in testing for linkage The x'* test consists of calculating 
the quantity 


(98) 


.2 

X 


1 

MPyPiil - Pi)(l - Pi) 


{(1 - Pj)(l - POmi 


— Pi{l — Pi)rrh — Pi(l — P2)Trt8 + PiP2TO41* 


and rejecting the hypothesis Ho($ = 0) if | x | > a, where 

In the special case (Pi = ^2 = 1) the parabolic test and the test are iden¬ 
tical; while comparing (98) and (79) we see that in the general case 


(100) tt = X. 

Hence in the general case the criterion used in the parabolic test may be 
written 


( 101 ) <^ = + Bx\ 

(1) Large Samples. For large samples the first term of the expression v + 
Bx is usually of small importance, since 
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« IS of form M'^ X (linear function of the d^’s), while 
Bx^ is of form M~^ X (quadratic function of the d.’s) 

For such samples the % test and parabolic test would appear to be nearly 
equivalent. 


TABLE II 

Limiting and Approximate Power Functions of Parabolic Test 

Pi = A = i 
-«.<$< 1.386 


Table lla 
M = 100 


9 

Power 

Limiting 

Approximate 

-2.00 


0.90870 

-1.50 

0.99880 


-1.40 


0.77656 

-1.20 

0.97915 

0.69506 

-1.05 

0.93786 


-1.00 


0.68580 

-0.90 

0.85024 


-0.75 

0.70467 

0.42756 

-0.60 

0.51632 


-0.45 

0.32258 

0.21849 

-0.30 

0.16986 

0.12504 

-0.15 

0.07905 

0.05689 

-0.10 

0,06280 

0.04438 

-0.05 

0.05318 

0.03866 

0.00 

0.05000 

0.04069 

0.05 

0.05318 

0.06021 

0.10 

0.06280 

0.07429 

0.16 

0.07905 


0.30 

0.16986 

0.26559 

0.45 

0.32258 


0 60 

0.51532 

0.75864 

0.75 

0.70467 

0.94245 


Table lib 
M = 3839 


0 

Power 

Limiting 

Approximate 

-0.25 

0.99932 

0.99853 

-0.20 

0.98502 

0.97521 

-0.15 

0.87243 

0.83620 

-0.10 

0.54197 

0.52066 

-0.05 

0.17827 

0.19223 

0.00 

0.05000 

0.04111 

0.06 

0.17827 

0.21568 

0.10 

0.64197 

0.59617 

0.15 

0.87243 

0.91641 

0.20 

0.98602 

0.99640 

0.25 

0.99932 

0.99999 


Theorem 4. The limiting power function of the x* test is 

(102) 1,1) = 1 - 1 p 

J-O 

denotes the region defined by the inequality | x 1 > o). 

This theorem may be proved by applying (46)-(48) to Qa{xi, xt, aJs) in 
Theorem 1 of the Appendix, and noting that u = x by (100), 



TEST FOB LINKAGE 


245 


We notice that 1 d), for a given value of d-, has the same value for all 

values of M, unlike the limiting power function |3 „(i0m j i?) of the parabolic 
test It is this point which accounts for the seeming paradox that, despite the 
manner in which the parabolic test was defined, for all values of and M 

(103) 1 »>) > 11?) 

as may be deduced from (92) and (102). This does not mean that for any 
given 1 ? and all M sufficiently large the power function of the test, 1 ■&), 

TABLE III 

Approximate Power Function 
Pi = Pi = I 
- 00 < e < 0.693 

Table Ilia. Table Illb. 


M 

= 100 

M 

= 400 

e 

Power 

d 

Power 

-0.45 

0.96288 

-0.25 

0.99424 

-0.40 

0.92161 

-0.20 

0.95482 

-0.35 

0.86072 

-0.15 

0.79787 

-0.30 

0.74351 

-0.10 

0.47734 

-0.25 

0.60197 

-0.05 

0.16378 

-0.20 

0.44054 

-0.02 

0.06810 

-0.15 

0.28380 

0.00 

0.05000 

-0.10 

0.16727 

0.02 

0.06885 

-0.05 

0.07737 

0.05 

0.17609 

0.00 

0.06000 

0.10 

0.55737 

0.06 

0.08029 

0.15 

0.90213 

0.10 

0.18177 

0.20 

0.99431 

0.16 

0.36464 

0.25 

0.99995 

0.20 

0.60278 



0.25 

0.82071 



0.30 

0.94976 



0.35 

0.99299 




is necessarily not less than the power function of the parabolic test, /3 Af(i« « j i?). 
For although, given any « > 0, there is a number M,,i such that if ilf > 

(104) I 1 1>) - 1 >?) 1 < e 

and 

(105) l^v(wvk) - |3.(w«lt>) 1 < « 
it may be that for such values of 

(106) 0 < 1 - P„(wm 1 «>) < 2t. 
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The above results show, however, how close the agreement between the power 
functions of the two tests is for large values 6f M In fact wo have 

(107) lim 1 0) = 1«?)- 

M—f<c 

This may be easily proved, since as M increases Wu approximates to 
(2) Small Samples. In order to obtain some idea of the relations between 
the two tests when M is small (i e. less than 100), the case Pi = P 2 = i, M = 32 
was considered in some detail. 

In this case our tests at 5% level of significance are respectively 
test, reject if 

(108) \2y - z\> 8.315 
parabolic test, reject if 

(109) (21/ - zf - U2y + z) > 69.576 
where 

(110) y = di z = d!t + d}. 

All samples for which the verdicts of the two above tests would not agree 
were obtained. These were as follows; 

(a) Samples for which Ho is accepted by test, rejected by parabolic test 

Probability of drawing sample of this type 
when Ho is true is 0.00320. 

(b) Samples for which Hq is rejected by parabolic test, accepted by test 

i/=0 1 2 3 5678899 Probability of drawing sample 

-— of this type when Hq is true is 

2 = 9 11 13 15 1 3 5 6 7 8 9 0,00038. 

Thus the probability of the two tests giving different verdicts when Ho is in 
fact true is only 0.00358 

It will be noted that the above results imply that 

(111) fe(ics21 0) - Mwx’ 1 0) = 0 00320 - 0.00038 = 0.00282; 

i.e. that the true levels of significance of the two tests are not equal, This is 
to be expected, because of the discontinuity of the probability distribution of 
sample points, which makes it unlikely that the level of significance of either 
test is exactly .05. 

Similarly we can obtain values of paiwso | 9) — 1 6), the differencesin 

the powers of the two tests with respect to various alternative hypotheses. 
These values were obtained for a few values of 6. 


y = ' 

1 

0 -1 

-2 

2 = 

-6 

1 

00 

1 

0 

-12 
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6 032(W31 I fl) — fi3l(w^3 I 6) 

-0.5 0.01625 

0.0 0.00282 

0.5 - 0.00006 

These figures indicate that the parabolic test detects negative d’s better than 
the t®st, but that the x^ test detects positive 6’s better than the parabolic 
test, although the advantage in this latter case is minute. 

The critical regions associated with the two tests may be represented by 
regions in the (y, z) plane The critical region for the parabolic test will be 
defined by 


( 112 ) 


i2y - zf - |■(2y + z) > V 


and that for the x* test, w^i , by 

(113) {2y - zf > v' 

where v — v'. 

is therefore the complement of the region lying between the lines Li, Lj 
with equations 2y — z = ±\/v'',Wm hes outside the parabola K with equation 
{2y - zf - l(2j/ + z) = V. 

Since v ^ v', K meets Li, Lt at points near the respective intersections of 
Li, Li with the line 2y + z = 0. See Figure 1. 

In the diagram the regions V\, Fj contain all sample points for which the 
x’' test rejects and the parabolic test accepts ffo; f7i, Vi contain all sample 
points for which the x” test accepts and the parabolic test rejects ffo. 

For a given value of 0 it is known that the probability distribution is approxi¬ 
mately such that the quantity 


(114) 


...» _ [y- - 1)1 

V«M + TVM(e»-1) 


^ (z -h W{e - 1)1* 


tVM - W - 1) 


+ 


ly + z + -^M(e* - 1)1* 
-h *M(e» - 1) 

is distributed as x'* with 2 degrees of freedom. 

The ellipses of equal density f S = constant have centers at points - 1], 

— |M[e* — 1]) which must lie on the line 2y z = 0 When 0 = 0 the center 
is at the origin, and the major and minor axes of the ellipse make angles of 
approximately 99.5® and 9.6® respectively with the y-axis. For small changes 
in 0 the angles of inclination of the major and minor axes of the ellipse to the 
coordinate axes are not greatly changed, and we see that as the center of the 
ellipse moves along the line 2i/ + z = 0 we have 
(1) 6 increasing', center moves downwards, tending to increase P\E eUi} — 
\E t Ys] while P{E e 7il and P[E e fM both become small. Thus | 0) 

tends to increase quicker than /3 w(Wx* I ®) 
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(^) 6 decreasing: here we have the opposite effect and PiuiwM \ 9) tends to 
increase slower than \ S). 

These conclusions agree qualitatively with those drawn in the case M = 32. 
(N.B. In the case = 32 no sample points fall into the region Ui because no 
points in I7i satisfy the inequalities (25), (26)). 

8. Some Geometrical Considerations. In this section we shall consider the 
manner in which the situations dealt with above may be interpreted in terms 


z 



of geometrical concepts. It will be convenient to consider as variables n, = 
mjM. The sample space W{n) is then bounded by the four planes 

0 {i = 1, 2, 3), 

1 . 

In this space, corresponding to any admissible hypothesis Ha specifying a 
value of 9, there is a point Ta with coordinates (9"h , 9"’) where 

= Pi(l - Pae), 

r = P2(l - Pie*). 


(115) 


n, = 


3 

£ n, 

t-l 


(116) 
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These are the proportions of results expected in the first three cells, if the 
hypothesis He specifying d be true 
Now, if Ho be true, we have 

( 117 ) P{ni = n'l, 712 = ni, Hi = ni ,ni = n'i\ Ho] ^ 
where c is constant for a fixed sample size M, and 


( 118 ) 


2^ 

M 


z 


(n.' - d^'Y 
6 ’^' 



1 - £ e"' 


Hence the most frequent position (s) of the sample point E will be some¬ 
where near the point To, which I shall therefore call the center of density It 
will be noticed that, whatever be the value of 0, the point 7'o must he on the line 

(119) ni - PtP2 = -K - Pi(l - P,)] = -[no- Pzil - Pi)]. 

This line, a segment of which is the locus of the center of density for our set of 
admissible hypotheses, will be called the line of density. 

In this space the parabolic test corresponds to a critical region comprising the 
exterior of a parabolic cylinder The equation of the boundary of this critical 
region at level of significance .05 was found for the case Pi = Ps = j, and a 
model made of it. Also included in the model were the ellipsoids 

(120) = Km 
where Km is a constant so chosen that 

(121) P(X6 > Tf.osl 0) .05 

corresponding to 

(i) the case when Ho is true 

[ii) the cases when 

(122) (a) Pi = p 2 = ps — P4 = M be. 6 — 0.41 

(123) (5) Pi = Pi = Pi — Pi = a i.e. 6 = —0.69. 

It was found that in the case Pi = Pj = I one axis of all the xrcllipsoids 
was perpendicular to the plane through the line of density and the axis of n,. 
The generators of the boundary of the parabolic acceptance region are also 
perpendicular to this plane. (By “acceptance region” is meant the complement 
of the critical region. The acceptance region may be written symbolically 
Wu.) There were further added to the model the intersections with this plane 
of the ellipsoids at probability level 01, corresponding to the three hypotheses 
consideied above {6 = 0, 0.41, —0.69) and two others, viz. 

(124) Pi = p 2 = Pi = Pi — ^ be. & = 0.92, 

(125) pi = A; P 2 = P3 = P4 = if i.e. 6 = -1.39. 
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For convenience in making the model to a simple scale (1 unit s 150 cms.) it 
was found necessary to take the sample size M as 1312.5. The model is shown 
in Figure 2 It will be seen that the acceptance region for the parabolic test 
is approximately enclosed between two parallel planes perpendicular to the 
plane common to the line of density and the axis of Wi. These two planes, in 
fact, enclose the acceptance region for the x° test. The vertex of the normal 



Fig. 2 

parabolic section of the parabolic acceptance region is at a comparatively great 
distance “below” the plane rii = 0. 

As an interesting digression we may use our model to compare qualitatively 
the parabolic test with yet a third possible test of ffo ■ This test is to reject 
Ho at level of significance .05 if 

(126) xl>Koo 

and may be called the xo iesi. The xo-ellipsoid shown in the model is the ac¬ 
ceptance region for this test. It will be noticed that when ff 0 the ellipsoids 
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of equal density include somewhat more of the acceptance legion of the xo test 
than of the parabolic acceptance region. This means that the xo test would 
detect that the hypothesis Ha(d = 0) is false in these cases, less frequently than 
would the parabolic and x^ tests. We also notice that the center of density 
Ts leaves the parabolic acceptance region before it leaves the acceptance region 
of the Xo teat as it moves along the line of density from the point where 6 = 0, 
whether the direction of motion of Ts corresponds to 6 increasng or decreasing. 
This also indicates that the xo test would act less efficiently than the other 
two tests. 


9, Appendix. In this appendix are obtained various results which, while 
essential to the main argument, would appear as digressions if they were inter¬ 
polated as required The numbering of equations in this appendix does not 
continue from that of the previous sections, but forms a separate group. 

Lemma. 7/ fa{m), • , /,(m) he (s + 1) functions of the k variables 

mi, m 2 , ■ , mk which are zero except for a finite number of sets of integral values 

of mi, ■ ■ • ,mk] and if Wo be a region in the space of m’s such that 

« 

(1) /o(w) > ^ a,/*(wi) in ujq 

4 

(2) /o(m) < 5^ in Wq 


Oi , 02 , 

(3) 

we shall have 


, Ofc being arbitrary constants; then if w he any region such that 

S /t(m) = ^ /,(m) (i = 1,.. •, s), 


(4) 

Proof. Let 


23/o(m) < 23/o(wi). 
w u>0 


(5) 


8 = 2/o(w) - £/o(m) 

UIQ W 

= foim) - 2 /oW 

Wq—WWQ W’-VfWQ 


where wwa denotes the common part of w and Wo . 

Hence the region w — wwt,, consisting of those points of w which are not in 
wwi , and so not in Wo, is contained in ©o • Similarly the region Wo — wwo is 
contained in Wo. Hence, by inequalities (1), 

6> 2 ISo./.Cw)}- £ {SoJ.Cm) 

too—Wo ) li>—wu >0 V.4"! 

5 > 2 Is - £ Is 

»o V.<“1 ) w / 


( 6 ) 

and so 
(7) 
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Since the total number of terms in each double summation is finite, we have 

(8) 5 > 2 a, 

^ — 1 IIJQ V} 

But 

(9) = L/.W, (f = 1, • •. , s). 

tuo tu 

Hence 

S > 0, and S/oW < Ij/oim). 

W UlQ 

A lemma similar to the lemma above, where the f's are taken to be integrable 
functions and summation over the regions w, roo is replaced by integration over 
these regions, is given by Noyman and Pearson [9] The proof given above 
follows the lines of the proof given in that paper. 

Theohem 1. Suppose that, in a quadrinomial population: 

(i) the cell probabilities are dependent on the number M of trials made, and are 


given by 

Pi == Poi + <Pm 

(10) 

Pa = Poi Vm 

p> ~ P03 ~ Vm 

Pi = Poi + Vm 

where 


(11) 

4 4 

2] jjo, = 2 Pi = 1 

.-1 4-1 

and 


(12) 

‘Pm — X(6^'**^ ~ 1) 


(ii) 

(13) = (m.- - a = 1, 2, 3, 4) 

where mi = number of results falling in i-th cell. 

(lii) w{x), or briefly w, is a region in the space W of xi, , xs ; and Pm{w) 

is the integral probability law of w corresponding to the values pi, pi, pi, of 
the cell probabilities given in (2) above when we have M independent trials. 

Then 


Pm{w) 


1 

(2ir)*pM 


III 




(14) 


dxi dxi dx^ 
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uniformly over W as M =o, where 

3 

2:3) = E + Po*Pm) + 2 |)^ E 

1=1 1<)S3 

(15) - 2XtJ{a:i(po/ - P 02 W) “ ^i!(po2 + Po2Poi) 

4 

“ 23(p03 + P03P04^)) + E Pl)i> 

i“l 

This theorem may be proved by the same method as that used by F. N. 
David [2] in proving the generalized theorem of Laplace 

I would like to thank Professor Neyman for his invaluable suggestions and 
advice in the preparation of this paper. 
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REDUCTION OF A CERTAIN CLASS OF COMPOSITE 
STATISTICAL HYPOTHESES 

By George W. Brown 

1. Introduction. A situation frequently met in sampling theory is the fol¬ 
lowing: X has distribution /(x, 9), where 9 is an unknown parameter, and for 
samples (xi, ■ • • , x„) there exists in the sample space En a family of (n — 1)- 
dimensional manifolds upon each of which the dwtribution is independent of 
9; in addition there is a residual one-dimensional manifold available for estimat¬ 
ing 9. For example, suppose there exists a sufficient statistic T for 9, then on 
the manifolds T = Ta there is defined an induced distribution which is inde¬ 
pendent of the parameter. 

A similar situation is observed when 9 is a "location” or "scale” parameter. 
Let X have the distribution/(x — o) for some a, then the set (xa — Xi, xj ~ 
Xi, ‘ •, x„ - Xi), or any equivalent set, such as (xa - x, • ■ • , x„ — x), have a 
joint distribution independent of a, and there is a residual distribution corre¬ 
sponding to each particular configuration (xa — Xi, • • • , x„ — Xi). Fisher 
[1] and Pitman [5] have examined the residual distributions in connection with 
the problem of estimating scale and location parameters. In this paper we 
shall be concerned primarily, not with the residual distribution, but with the 
remainder of the sample information, corresponding to the (n — l)-dimensional 
distribution which is independent of the parameter. It is found, in a rather 
broad class of distributions, that the part of the sample not used for estimation 
determines, except for the parameter value, the original functional form of the 
distribution of x, 

This paper is devoted mainly to a study of particular classes of distributions 
having the property mentioned above. We consider also the theoretical appli¬ 
cation of this property to certain types of composite hypotheses which may be 
reduced thereby to equivalent simple hypotheses.^ The principal results of this 
nature may be summed up as follows; If x has distribution of the form/(x, 9), 
where 9 is either a location or scale parameter, or a vector denoting both, then 
there exists, in samples (xi, • • • , x«) a set of functions 3 /»(xi, ■ ■ > , x„), i = 
1, 2, • ■ ■ , p, p < n, having joint distribution D{yi , • ■ ■ , j/p) independent of 9, 
and such that the converse statement holds, namely, if {y<} have the distribution 
I • ■ iVp), ® has, for some 9, a distribution of the form f{x, 8). There 
is a corresponding statement when x has a distribution of the form/(x — 2a,- m,), 
where the {a,} are parameters, and the {«<) are regression variables.. 

^ We use the terms simple and composite hypotheses in the sense of Neyman and 
Pearson (2|. 
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2. Location and Scale. This section is devoted to the study of functions of 
the sample observations which arc such that their distributions determine the 
distribution of x, except possibly for location and scale 
It will be assumed that associated with x there is a function ¥{%) such that 

(a) F(x) is monotone non-decreasing, 

(b) = lim F{x) ~ 0, and (c) f(») = lim F{x) = 1 

a-»—no as-^co 

with the normalization F{x) upper semi-continuous. F{x) is the probability 
that the random variate takes a value less than or equal to x. If F{%) is as¬ 
sociated with the random variate x we say that x has the distribution F{x). 
If g{x) is a Borel-measurable function, the Lebesgue-Sticltjes integral 

f g(x) dF{x) is denoted by E[g{x)], 

J—00 


The charactenstic Junction <p{t) = determines F(x), that is, if 

f e''"dG{x)=\ e’‘*dF(x), thenF(x) = G^Cx). 

J—OO 

Similarly, let F(xi, • • • , x^) be such that 

(o) F{xi , ■■■ , x._i, X, + h, x,+i, ■ , xjb) > F(xi, ■.. , X,, ■ , x*,) for 

h > 0 and z = 1, 2, • • , fc; 

(b) lim F(xi, ,xk) = 0,1 = 1,2, ,k; 

(c) lim F(xi, • • <, Xi) = 1; 

,**-*00 

with the normalization F(xi, • ■ , xt) continuous on the right in each x,. If 
F(Xi, • • , Xu) is associated -with xi, • • , Xfc we say that xi, • , xn have the 

joint distribution F(xi, • , xjt). As before, E[II(xi, • , x*)] = / H dF, 

Jsk 

where Bk ia the Euclidean A:-apace. It is well known that under such condi¬ 
tions, given Borel-measurable functions y^{xi, ■ ■ ■ , xC), z = 1, • • ■ , p, p < fc, 

then G{yi, • • ■, J/p) = I dF{xi, • • •, x*), where B{'y) is the region [yi{xi, • ■ , 

^h) < Vi, ■ ■ ■ , 2/p(®i) • , ^ vA) is again a distribution function satisfying 

'the conditions above. Moreover, / g{yi, ■ , yj) dG{yi, ■ ■ ■ , yp) = 

Jr 


/ , Xk), ■ , Vpixi, ■ • , Xfc)] dF, where R' is the set of all points 

Jr' 

(xi, • ■ , Xfc) such that [yi(xi, • , x*,), • , yp(xi, ■ • , x*,)] e B. 

If X has distribution F{x), then, by definition, the set (xi, , Xn) is a sample 

from this distribution if Xi, • ■ , x» have the joint distribution i^(xi) • • F(x„) 
The following theorem states that two distributions giving rise, in sampling, 
to the same distribution of the set xi — Xa, Xj — x„, • • • , x„_i — x„ , with 
n > 3, can differ at most by a translation, that is, the distribution of that set 
determines the original distribution except for location. 

Theorem Ia: Let z have the distribution E(x). Denote by S the set of zeros of 
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j dF{x) and denote hy t the g.l.b. of \ l \ for t in S. Suppose that the comple¬ 
ment of S is e-connected.^ Suppose that x' Has distribution G{x'), and letxi, ■ ■ ■, x„ 
and x'l , ■ ■ , x'n he samples. Then the set Wa = Xa -• , a = 1, . ■ , n — 1 , 

have the same joint distribution as the set = Xa — Xn if and only if there exists 
a constant a such that x' a and x haive the same distribution. 

Proof: The sufficiency of the condition follows immediately, since w'a = 
x'^ - x'n = (x'„ + o) - (x'„ + a). 

In establishing necessity, only the fact that Mi, wi have the same joint dis¬ 
tribution as w'l , w't is needed. This hypothesis implies that 

that is, 

Set (p{t) = E{e'‘“‘), ^(i) = E{e'‘‘'). The relation above becomes 

(1) <p{ti)'P{U)<p{- ii-U) = ti - u). 

Consider equation (1) for values of k , U in the neighborhood of t = 0. (i5(0) = 
^(0) = 1, hence there is an interval | i | < i, in which and ^(t) do not 
vanish. It is easily shown that <p(0 and are each continuous, since e'‘*,in 
the neighborhood of i = 0, is continuous uniformly for any bounded interval 
of X, and since A may be chosen.so that 1 — F(A) and F{—A) are both as small 
as desired, In the in terval | i | < 5 the f unction /(<) = if{t)/4/{t) is continuous. 
Also, <p{ — t) = ip{t) and f(—<) = ^(f)* Setting = 0 in (1) we obtain 
<p(.t)ip{—t) ~ 'P(.t)^( — t), hence [ »j( 0 1 = | ], that is, \f{t) \ = 1. /(<) takes 

values on the unit circle of the complex plane, and /(O) = 1, hence there is an 
interval \l\ < S' such that z = f{t) lies on an arc y, of length less than 2ir, 
containing the point z = 1. Now consider the functional equation (1) for 
1 h 1 < 4fi', I ia I < (1) becomes 

mmfi- k - k) = 1. 

The interval | f [ < J' was so chosen that for j ii 1 < iS', | | < ^5', it is possible 

to define a single-valued branch of the argument of /(h), /(!*), and f{fi -p <a). 
Letting fa = 0 we have/(f)/( —i) = 1, hence, replacing/(— k — tf) by l//(fi + 
fa) in the last equation, we have 

f(.k)f(k) = fik + fa). 

■A-rg/(fi), arg/(f 2 ), and arg/(fi -\- k) are uniquely determined, except for some 
fixed multiple of 2ir. If we choose the principal value of the argument, i.e., so 

* The set S is e-conneoted if any two points p, g, in S can be connected by an t-chain, 

1. e., there exists a set Po = p, pi , • • ■ , pn-i i Pn = g, such that | p,- — p,_i | < «, t = 1, 

2, ■ ■ , n. 
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that 0 < arg /(/) < 2 ir, we must have 

arg/(ii) + arg/(< 2 ) = arg/(ii + t%) 


for I < 1 1 < 1 < 2 1 < 2 ^'- Since arg/(t) is continuous, any solution of this well 

known functional equation must be of the form arg f{t) = at. [/(<) | = 1 , 
therefore there exists a constant a such that /(f) = e’“‘, for | f | < that is, 
<p{t) = e’*V(0; for 1 f 1 < ^5'. By use of (1) this may be extended to hold for 
all i such that 11 j < e, where e is the minimum modulus of all t such that (p{t) = 0 . 
( 1 ) may now be used to extend the relation for all t such that ip(t) 7 ^ 0 by choos¬ 
ing an «-chain connecting the origin to the point t. We know 'already that 
tp(t) = e’“V(0 if <p(i) = Oj hence it holds for all t. This relation says that 
hence x' + a and x have the same distribution, thus 
completing the demonstration of the theorem. 

It should be remarked that the set (a:i — a;„ , • • , x„-i — in) may be replaced 
in Theorem la by any equivalent set, for example, (xi — x, , Xn-i — x). 

The next result is of the same natuie as Theorem la except for the replace¬ 
ment of the location parameter by a scale (positive or negative) parameter. 


f oo 

ed 


are nowhere dense, and let x' have distribution 0{x'). Let xi, • , Xn and 

xi, x'n be samples from the distributions of x and x', with n > 3, then the set 
Wa = XajXn , o! = 1, • • ■ , w — 1, hmo the same distribution as the set w'a = 
x'afxn if and only if there exists a constant e such that ex' and x have the same 
distribution. 

Proof: The sufficiency of the condition is evident. Suppose, then, as before, 
that w\ , Wi have the same joint distribution as wi ,wi. Log \ wi [ and log \ Wt | 
have the same joint distribution as log | wi j and log j wj 1 , hence by application 
of Theorem la to log j x j and log j | it follows (since the complement of a 
nowhere dense set is e-connected for every e) that there exists a constant a such 


that 


£ 


log|x( 


dF{x) 


= £ 


It [lo8 li'l-ol 


dG{x). 


Let y = e~“x', then | x \ and | y | have the same distribution, and 
(2) f e’‘ dPix) = / 6 ’' dHiy), 


where y has distribution H{y). We now have to show that either y or —y has 
the distribution of x, that is, it must be shown that either H(y) = F{y), or 
Hiy) = 1 - F(-y). 

By the first part of the theorem the functions Ui = yi/ya and Ut = yt/ya have 
the same joint distribution as Wi, Wi. It is clear that the mean value of any 
function of Ui and Mi is the same as the mean value of the corresponding func- 



258 


GEORGE W, BROWN 


tion of Wi and . Hence 


If I «•'“»gn ». sgn w,dF(x,) fffa.) iFfe) 


= III «•'" "SI «. sgn » dH(y,) dH(y,} <£»«, 

—00 

where sgn a; = 1, for a: > 0, sgn a: = -1 for a: < 0. 

(sgn Mi)(sgn Wi) = (sgn a;i)(sgn Xi), 
so that the last equation becomes 


III 


jt'Ki (loB liil- log |iil)+(j (log lijl- log |»j()l 


Sgn ail sgn dF(xi) dF(xi) dFixa) 


(3) 



In (log ii/ii-iogivsi)+(j (log ii/j|- log |d,|)j 


8gnj/i 


Set 


X sgn Vi dHiyl) dH{yi) dH{yt). 


Ut) = / 

Ht) = j «*“"*'*' sgna:dJi’(a:); ^{t) = J e’'**’*'”' egnydH{y). 


From (3) we have k - k) = <P2{k)M<fiii- k - h) for all 

k,k, and from (2) we have for all t, hence, if ,^i(- _ ti) 5^ 0, 

Mk)h{k) = n{k)<P2(k)- By hypqthesis the zeros of 4^1(1) are nowhere dense’ 
hence ^k - k) = 0 there is a sequence such that ~ 

and ) ?^ 0. Now take an arbitrary sequence ii'"’ such that ti"’ —> ti, 

then Jj” - t - ii"’ must tend to k . For each n we have = 

Viik )<fi2{k ). All the functions appearing are continuous, thus we see that 
h{k)Mk) = v52(i!i)(P2(f2) for all ti,k~ From this it follows directly that either 
hit) = nit) for all t or 4/2(1) = -^j(0 for all t We have^ 


hit) = I + j_' e''dfr(x) 

hit) = I e’“"**dF(a:) - e’dF(i) 


The aBBumption has been made implicitly that F(x) and G(x) are continuous at i - 0, 
otherwise the distribution of *./*„ is not properly defined, and the functions ^.(t) and (t.(0 
are tJien not defined. Similar assumptions will be made whenever necessary in later 
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«=i(0 = [ e'‘'°'‘UH(x) + f e’‘di7(x) 

Jo J-M 

JQ J—eo 

Combining these expressions with the relations obtained above leads, by Fourier 
inversion, to the result that either F(x) = H{x) or H{x) si — F(—x). We 
have shown that either y or -y has the same distribution as x, that is, either 
e~\' or — c~“x' has the same distribution as x. 

Theorem Ib states essentially that the joint distribution of the set x«/x„, 
a = 1, • ■ , w — 1, determines the distribution of x except for a scale parameter 
and possibly a reflection. In the event that x has an asymmetrical distribution, 
and if it is desired to rule out negative changes of scale, a variation of this pro¬ 
cedure is necessary. The next result is appropriate for this situation. 

Theorem Ic Lei x have distribution F{x) such that the zeros of J e'‘ dF(x) 

are nowhere dense, and lei x' have distribution G{x'). Let Xi, ■ , x„ and 

x'l, , x'n he samples from the distributions of x and x', with n > 3. Express 

Xi, • , Xn and x'l, • ■ • , x^ in spherical coordinates 

Xi = r cos 8 i, xi = r' cos 6 [ 

X 2 = r sin di cos 82 , Xj = r' sin 8 [ cos’ 8 ^ 

• • 

• « 

Xn = r sin 81 sin 02 • • ■ sin 6 n-i, xi = r' sin 81 sin 62 • • sin 0j,_i. 

Then 0i, • • • , dn-i have the same joint distribution as 81 , di-i if and only 

if there exists a positive constant k such that hx' and x have the same distribution. 

Proof: Sufflciency of the condition is an immediate consequence of the fact 
that 01 , • • , 0„_i are invariant under the transformation x = fcx', with fc > 0. 
If 01 , • ■ ■ , 0n_i have the same ]oint distribution as 0i, • • , 0n-i then the set 
{xa/xn} have the same joint distribution as the set [xi/xi], hence, by Theorem 
Ib, there exists a constant c such that ex' has the same distnbution as x. To 
establish necessity of the condition we must show that | c ] x' has the same 
distribution as x. 

Set y = I c I x', and let j/i, • ■ • , be expressed in spherical coordinates; 
2 / 1 , • , have the same angular coordinates 0i, ■ • , 0l_i. This implies 
that Xi/r and X 2 /r have the same joint distribution as yi/R and yi/R, where 

^ ~ ^y\ -V ■ ■ ■ + yi f * 1/1 l> i^herefore Xi/| X 2 1 has the same dis¬ 

tribution as 1 / 1 / 12 / 2 1» 80 that 

00 ee 

j j e" '"“kl agn dF{xi) dF{x 2 ) = J j sgn dH(yi) dHiy 2 ) 
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= sgn Xi, SO that the last equation 

yields 

f e“ sgn X dFix) ■ f e~'‘ dFix) 

tL^to *^00 

= f e“ sgn a; dH{x) -f 6“’' dHix). 
J—«e J—oo 

We know already that | x | and [ y | have the same distribution, so that 


(4) 

re’“"‘''*'dP(x) = 1 

r gXionH 




thus 


/•* 

r* t 

(fi) 

/ c“ sgn X dF(x) = . 

iL-eo « 

6’“°“'^'sgnxdH(x), 

^eo 


except possibly for zeros of / e dF(a:). By hypothesis the exceptional 

J^oo , 

points are nowhere dense, so that, by continuity, (5) holds for all t. (4) and 
(5) together imply, as in the proof of Theorem Ib, that F(x) = H{x), i.e., x and 
I c I a' have the same distribution. 

The next three results are generalizations of Theorems la, b, c, to analogous 
multivariate situations. The first of these is a direct generalization of 
Theorem la. 

Theorem IIa: Let Xi, ■ ■ , Xk have joint distribution F{xi , • ■ • , x*,) such 

that the complement of the set S of zeros of j dF{xi , • , Xk) is e-connected, 

where e is the g l.b. o/ j f | for {t) in S, and let yi, , yk have joint distribution 

FriVi, ■ ■ , Vk)- Let {xi , • ■ , Xk) and (yi , • • , y*), a = 1, - , n,be samples 

from these distributions, with n > d Then w,^ = x^t — x” , i = 1, , k, 

= I, ■ ■, n — I, have the same joint distribution as the corresponding set v,^ = 
— yi ''f ond only if there exist constants oi, • • , at such that yi ai, ■ • ■ , 
Vk + ak have the same joint distribution as Xi, ■ , x* . 

Proof: Set 


V>(tl t ‘ * 

■ 1 k) = 

f e'V^^'dFixi, .. 

■ , Xk), 

i^(h, 

■ >tk) = 

/ dG{yi,.. 

• ,yk). 


Jf w,ff , i = 1, , /c, ^ = 1, 2, have the same joint distribution as v^ , then, 

as in the proof of Theorem la, we have 

<p(tii I • , tki)<f>(tu , • , lkt)<p {— hi ~ hs, ■ • ■ , — tki — tk2) 

— 'P{tii, ■ • • , tki)\p{ht, • ■ • , tk2)ik(~ til — ha, •' • , “ tki ~~ lki)‘ 


if y has distribution H{y). Sgn 


(ra) 


(6) 
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Agam, as before, | | — | \^ |; <e(<i, • , and , ■ ,4) are continuous; 

(p(0, 0, , 0) = f{0, 0, ■ ■ • , 0) = 1 Theie will exist a neighborhood N of 

(0, 0, , 0) such that for (4 , ,tk)tN the function /(4 , ■ ■ , 4) = 

ip ik , ■ ■ ■ , k) -g and continuous. Then there will exist a neighborhood 

i/'(41 • ■ ■ t h) 

N' CZ N such that in N' there exists a uniquely determined branch of 
arg /(4 ) ■ ■ • ; 4), continuous in N', and such that if (4 , , 4) eN' and 

(wi, • ■ , Uk) t N' then arg /(4 + , • , 4 + Uk) is also uniquely determined 

and continuous. For (i) e N' and (u) e N', arg / satisfies the relation 

arg /(4 , • • , 4) + arg /(ui, , wi) = arg /(4 + Wi, ■ , 4 + w*) 

It IS easily shown that any continuous function satisfymg the equation above 
must be of the form 2(1,4 , therefore 

(7) ^(4, • ■. 4) = c* I “'‘^1^(4, • ■ • , 4); (t)eN' 

Just as in the proof of la the relation (7) may be extended, by use of (6), to 
hold for all t. This implies, finally, that the set [y, + a.) have the same 
joint distribution as the set {a:,). 

Theorem Ilb is a generalization of Theorem Ib to multivariate distributions 
Theorem IIb: Let xi, ■ • ■ , Xk have dtsinbution F{xi, ■ • ■ , Xk) such that the 

sens of J e'^'' dF(xi, ■ ■ ■ , Xk) are nowhere dense, and leiyi, • ■ ■ ,yk have 

distributionG{yi , • • • , Vk). Let {xt , , xt) and {y? , ■ ■ ■ , yk), « = !,•• ,n, 

be samples, mth n > 3. Then the set w,/ = x^/x" , t — 1, ■ • , fc, /3 = 1, , 

n — 1, have the same joint distribution as the corresponding set v,^ = t/i/yt 
and only if there exist constants Ci, • • • , Ck such that the set c^y, have the same 
distribution as the x ,. 

Proof: The demonstration is parallel to that of Theorem Ib By Theorem 
Ila there exist oi, • • , a*, such that 

1°* l*rl) = l!(rl+Or^ 

Set g, = e^’yr, then 

(8) / dF(xi, = j dH(zi, •••,2*), 

where (zi, • • , Zk) have distribution function H{zi, • , 2 *). 

We shall continue the proof from here under the assumption that k = 2. 
It will be evident how the proof goes for any k. We have, since gj/zj have the 
same joint distribution as , 

ee 

III (^^^Bgn(^^^dF{x\,x\) dF{x\, xl) dF(.xl,xl) 

(9) 

= /// Bgn(^^sgn(^^dH{xl,xl)dH{xlxl)dHix\,xl). 
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Both members of (9) are evaluated as products, just as was done in previous 
proofs, and from the result, combined with (^), we conclude, as in Theorem Ib, 
that 


// sgnxi dFixi,X2) = s ,/J sgn xi dH(xi, x,), 


where si = ±1, for all (h, < 2 ) Similarly 


00 ^ 

.24io«u,i ggjj sgn 2:2 dH(xi, x,) 

—(C —to 

and 

ce 00 

iSi, log M ggj^ ^ a;j) = S3 J J sgn xi sgn 3:2 dH(xi, x^)^ 

—00 —00 




with S2 = ±1, S3 = ±1, 

so 

Set V’i(h,h) = jj sgna:idF(a;i, 2:2) 

—00 

00 

<p2(ii, ~ j I sgn 2:2 dF(xi, Xi) 

—00 

00 

‘Paih, = j j Sgn 2;i sgn Xi dF(xi, 2:2) 

—00 

and let ^i(h , ^2), ^a(h , <2), and ^i2(<i, h) denote the corresponding transforms 
of , 2:2). We have 

<ptik, fe) = Si^iik, k) 

(10) ^ k) = k) 

jpnik, k) = Ss4'a{ti, k) 

with Si = ±1, S 2 — ±1, and S 3 = ±1. 

Now, as in (9), by considering ^|^g*=*'-(i(iog|*«i-iogks|) ggj^ ^^|^,,agn J we 
obtain the relation 

t i2l)(P2(h2 , i22)¥’12(— hi — ^12, — hi — hg) 


— ^i(hi, hi)<^'2(h2, kt)4'ii(~ hi ~ h2 , ■” hi ~ ^ 22)1 
showing that Si, S2, S3, may be chosen so that S1S2S3 = 1, that is, SiSg = sg. 



COMPOSITE HYPOTHESES 


263 


Consider now the variates z\ = Sr 2 r,r = \, 2. Let K{z[, s'l) be the distribu¬ 
tion function of z[, z'l If we lot h{k , ij), &<i{k , h), and euCh , ti) he the trans¬ 
forms of K which correspond to >p^{h , k), iPi{k , k), and <pnik , k) lespectively, 
it IS evident that 


( 11 ) 


<Pi(h , ia) = 6i(ti, fa) 

, ti) — Bilfi , k) 
tpnik , fa) = Bis{ti , k) 


Moreover, from (8), 

00 flO 

// = JJ dK(xi,x,}. 

—oa —00 

The last relation, together with the equations (11) imply that F(x) and K(x) 
coincide in each quadrant, thus F(xi, X 2 ) = K(xi , x^) for all xi , Xi. 

The final result is that z [, have the same distribution as Zi, X 2 , i.e , Sie“‘i/i 
and Sa 0 “’ 2/2 have the same ]oint distribution as Xi and xa. 

The next result bears the same relation to Theorem Ilb that Theorem Ic 
bears to Theorem Ib, that is, only positive scale changes are to be permitted. 

Theorem lie: Let xi, • , Xl have distribution F(xi, , Xk) such that the 

zeros of J ■ , xf) are nowhere dense, and let yi, • ■, yh have 

distribution G(yi , , yk) Let (xf , • • , Xk) and {yt, • , yk), « = 1, 2, 

, n, be samples with n > S Express xf , ■ ■ ,Xk and y\ , ■ ,yk m spheri¬ 

cal coordinates 


X, 


y^ — Ej cos <p ,, 
yl — sin (fl cos (p \, 

r, sin fli • sin y” = Ri sin ipl • • sin <p” 


xl = r. cos 01 , 
x, = r, sin 0,^ cos O ], 


Then { 61 }, i — 1, • • • , /c, /3 = 1, • • • , n — 1, have the same joint distribution 
as {^^?) if and only if there exist constants A;, >0,i = l, • , k, such that the 
set kiPt have the same joint distribution as the set x,. 


Proof: If {0?} have the same distnbution as jcpi) then it follows that 


have the same distribution a.s , hence by Theorem lib there exist constants 

c, such that have the same distribution as (x,). Set Zi = j c; ( y, ; we 

wish to show that (z.) have the same distribution as jxi) By equation (8) 
in Theorem lib it is known that {] z, j] have the same distribution as da:.!}, 
moreover, if we express z,“ in spherical coordinates, the angular coordinates are 
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the same as those of yt , therefore 


a:l \ 
,1 ** 1 / 


have the same distribution as 


since these functions are obtainable in terms of the angular coordinates. 

As before, we shall continue the proof from here under the assumption that 
k = 2. The procedure is a generalization of the procedure in the proof of 

Theorem Ic. sgn = sgn and similarly for y, therefore 


( 12 ) 


I sgn xUF(xI, Xi) dF(xl, xl) 

= /sgn xl dHixl, a:J) dH(xl, x\), 


i = 1, 2, 

where it is assumed that Zi, zj have distribution II{z \, £ 2 ). As before, set 


<p{k,h) = / 

<Pi{k, k) = f sgn x,dF{xi, %), i = 1, 2, 

<finik, k) = j e**'’-'”*'*'' sgn Xi BgnXidF(xi, %), 


and denote the corresponding transforms of Hixi, Xi) by 6 (k) k), &i{k) k), 
Bi{k, k), and dnik, k). It has been remarked already that {j Zi j} have the 
same distribution as {1 a:< |)»therefore 6 (k, k) = <p{k , k)- Equation (12) yields 
the relation (p^{k , kjvi^k , —k) = Bi(k , k)B(—k , —k),i= 1, 2| the zeros of 
v(fi j k) are nowhere dense, so that it can be concluded that v>t(k , k) = Bi(k > k), 
i = 1,2. Now, from an equation similar to (12) we obtain (/>ii{k , k) = Bu(k , k)- 
As in Theorem Ilb, the four relations above together imply that Fixi , xs) = 
H{xi, Xi), in other .words, {| c, | yi] have the same distribution as [xi]. 

We are now in a position to combine some of the preceding theorems so as to 
obtain analogous results for scale and location parameters together. 

Theorem IIIa: Let x have distribution F{x) such that the zeros of / e’'* dF(x) 
satisfy the condition of Theorem la, and the zeros of 



g.'O log logl»,-x,| 


are nowhere dense, and let y have distribution G{y). Let xi, • ,Xn and 

Vi, ‘ • tVnbe samples, with n > 9. Then Wa = - - , a = 1, ■ ■ > , n — 2, 

Xn—1 Xfi 

have the same joint distribution as the corresponding set w'a = — - — if 

1/n-i - Vn 

only if there exist constants a, c, such that c{y — a) and x have the same distribution. 
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Proof Sufficiency of the condition is an immediate consequence of the fact 
that wL is invariant under transformations of the form y' = c{y — a). Assume 
then that {w)„} and (lOa} have the same joint distribution. By elementary 

transformations it is evident that the functions~ ~ 

Xt ~ Xa X7 — X^’Xs — Xo’Xg — Xg’ 
have the same joint distribution as the corresponding functions of the y’s, if 
n > 9. Since Xi, ■ ■ ■ , x„ form a sample it follows that the pairs {xi - xa, 
xa — a;}), {xi — Xe, xs — a^), {xi — Xg, Xg ~ Xq}, have the same joint distribu¬ 
tions and are pairwise independent, and similarly for the corresponding func¬ 
tions of the y’s. Theorem Ilb assures the existence of constants Ci, Cj, such 
that Ci(yi — 2 / 3 ), 02 ( 2/2 — 2 / 3 ) have the same joint distribution as (xi — ajs), 
(x 2 — xa). Considering separately the marginal distributions it is seen that 
Cl( 2/1 — 2 / 3 ) has the same distribution as 02 ( 2/2 — 2 / 3 ) • 2/i — 2/3 Vt ~~ 2/3 h3,ve 
the same distribution, therefore either 02 = Cj, or 02 = — Ci Set Ua = x^ — xg, 
Va = Ci( 2 /a — 2 / 3 ), a = 1, 2. We have, for the distributions of (ui, Ua) and 
(vi, Va), relations corresponding to (10) in Theorem lib, with the additional 
condition that Si = S 2 , because of the symmetry in the variables. This implies 
that either (ui, W 2 ) or (—Wi, ~va) have the same joint distribution as (ui, ua), 
that is, there exists c such that e(yi — ya) and 0 ( 1/2 — Va) have the same joint 
distribution as Xi — xa and Xa — xa. Application of Theorem la now completes 
the proof. 

Just as before, there is an analogous situation when we consider angular 
coordinates instead of quotients. The proof is immediate; the angular coordi¬ 
nates determine the angular coordinates of {xj — Xa ,xa — Xa], {xa — xg, Xa ~ Xt], 
and {xr — xg, xe — 2:9}, arranged as a sample. Then the constants Ci, 02 in 
the proof of Theorem Ilia are both positive; it follows that ci = C2. Applica¬ 
tion of Theorem la gives 

Theorem IIIb : Let ii, • • • ,Xn and 2/1, • • • < 2 /n satisfy the hypotheses of 
Theorem Ilia. Set 

xi — = r cos fli, 2/1 - 2 /n = cos e'l , 

2:2 — 2:n = r sin 0 i cos da, 2/2 “ 2 /n = 8 m 61 cos &a, 


Xn-i — = r sin ffi ■ ■ ■ sin d„-2 ; 2 /rt-i — 2/n = c' sin • • sin K-i • 

Then di , • have the same joint distribution as &(,■■• , Sn_2 if and only if 
there exist constants a and c > 0 such that c(y — a) has the same distribution as x. 

Theorem IVa is a generalization of Theorem la to cover arbitrary linear com¬ 
binations of some subset of the sample. 

Theorem IVa: Suppose x has distribution F(x) such that j e’*”' dF{x) does not 

vanish, and let y have distribution G{y). Consider the functions Wa = 

♦I—tn fi—Tji 

3'a ' j ” Va ™ ; 01 1 “ 2j ■ ' ■ , Tthf ^ Ij 2j ■ ■ * j 

0^1 0^1 
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n — m, and suppose that m > n ~ m. Then, if {lya} have the same joint disiri- 

n—»i 

huiion as (wl) and if Iap ^ 1 for somft a, it follows that F(y) s G(y)i if 

^lap = 1 for all a there exists a constant a such that F{y — a) = Giy), 
p 

Proof' Denote the characteristic functions of x and y by ip{t) and ^{t) respec¬ 
tively. By expressing the fact that {«),) and , a = 1, 2, • , R — m -f p 

have the same characteristic function we obtain the functional equation 

w —m+ I n—m / n--m+l \ n «-wf l n—m / n—jn+l \ 

n Lpta) = n lAwiivf'i—2 lpl). 

a=l fi~l \ a=l / /S-1 \ a=l / 

By hypothesis *5(0 does not vanish, therefore \p{t) has no zeros, because of the 
relation above. ¥>(0 and \pit) are continuous, thus the function f(i) = 
log ¥>(0 — log ^(0 can be uniquely defined in a continuous manner for all t 
The equation above becomes 

n —m+ 1 n—m / n—wi+l \ 

(13) r /(0 +s/(-r mO = o. 

The constants hp are necessarily linearly dependent, so that, for some a, lap 
can be expressed as a linear combination of the others; suppose then that 


In- 


n—w+l,^ 


n^m 

= LeJ 

a®*! 


Putting these values in (13) we Have 

n—m+l 

(14) 


-m-fl n—m i n—m 'y 

/(O "H £/( ~2 l«P^OL + tn~7a+lCa) ) = 0. 

It can be assumed that SCa 0 , for, if 69 = 0 for all a, we have ln-m+i,p = 0, 
^ = 1, • ,n - m, that is, w„-ni+i = Vn-m+i and Wn-m+i = Xn-m+i , hence x 

and y have the same distribution. Assuming Ci 9 ^ 0, set ta = — , 

a ^ 2, ■ ■ ■ , tt — m, in (14), obtaining 

n—»-»m 

(16) /(<i) + 2/( — Sain-m+l) +/(^Tv-m+l) + fi — hpib + eit„^n+l)) — 0) 




^=1 


now, recalling that/(O) = 0, set b-m+i = 0, getting/(<i) + '1^ f{-lipti). 

Evaluating this with argument k -f , and substituting back in (15) it 

appears that 

ti—m 

( 16 ) /(O +/(<n-»+l)+ £/(•~ea<n-m+l) =/(^l + Cii„_,„+i). 

o-J 

Now setting = 0 in (16) we have the relation 

n—m 

/(^n—m-fl) H“ m+l) — 0 * 
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thus we have finally f{ti) + = /(ti + or, since ej 0 , 

f(ti + h) = f{ti) + f{t 2 ). The last relation implies that/(i) = ct, since/(i) is con- 

f n—m+1 W'-jn+l n—m 

tinuous. Now replace/(t) by cJin (13), getting 2 «<» - Z T, = 


a-l 


0, that is, either c — 0 , or ^ la^ = 1 for all a. We conclude then that tp(t) = 


3=1 


\l/{f), unless £ ^a 3 = 1 for all a. If 2 ^a 3 = 1 for all a we have ^(t) = e'V(0- 
_ 2 _ (S 


(f>(—t) = (fit) and ^(~t) = hence c is of the form c = ta, where a is real, 
m other words = e'“V( 0 ( concluding the proof of the theorem. 

It was assumed in Theorem IVa that tp{t) ha.s no zeros. If ip(_t) has zeros 
we have proved that, for an interval | i ] < e, <p{t) = ^(i) (or (p(t) = e’‘‘V( 0 )' 
This does not necessarily imply the result of Theorem IVa, but it does imply 
at least that if the /cth moments of x and of y (or oi y ~ a) both exist they 
are equal 

The last result in this series can be proved by methods similar to those used 
in Theorem IVa. ^ 

Theorem IYb.’’ Let z and y satisfy the hypotheses of Theorem IVa. Suppose, 
moreover, that m > 2(n — m), that the rank of || L 3 |i is n—m, and that 

y r—m 

2 Za 3 1 for at least 2m — n values of «. Then, if there exist constants {Ca} 
3-1 

such that the set (caU>o} have the same joint distribution as |Wo}, it follows that, 
for some a, Cay has the same distribution as x 


3. Application to Composite Hypotheses. The results of section 2 have a 
significant application in the theory of testing composite hypotheses. Suppose 
that X has a distribution of the form F(z, fli, 02 ), and that the hypothesis 
02 = d\ is to be tested, without reference to the value of 0i We assume that 
the parameters are independent, i.e., F{x, 0i, df) = F[x, d'l, 02) implies that 
01 = 9[ and 02 = 02. It is true in a wide class of important cases that, given 
a sample Xi, ■ , x„ from the distribution F(x, 0i, 62 ), there exist functions 
ya(xi , ■ • • , Xn), « = 1, 2, • • ■ , p, such that {i/„} have joint distribution inde¬ 
pendent of 01 , but depending on 02. Now if the [ya] are such that their joint 
distribution redetermines the original distribution, except for 0i, one can reason¬ 
ably use the p-dimensional distribution of the [ya] for testing the hypothesis 
02 = 02 , thus reducing the composite hypothesis to a simple hypothesis. In 
testing this simple hypothesis, every alternative hypothesis (corresponding to a 
value of 02 ) determines a distribution of x among the alternatives F{x, 0i, 02 ) 
except for the unknown 0i, that is, there is a one-to-one correspondence between 
the two sets of alternative hypotheses, expressed by the fact that if 02 = 02 
then the distributions of the set [ya] corresponding to 02 = 02 and 02 = O'f 
must be different. 

Suppose, for example, that it is desired to test whether y = s — a for some a 
has the distribution F(y, 0 “), with the assumption that, for some a, y has the 
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distribution F{y, 6). Given a sample one can form the set Wa = Xa — Xn, 
a = 1,2, ■ ■ ,n ~ 1, obtaining the distribution G(% • , Wn-i, 9); now con¬ 

sider the simple hypothesis 6 = 0°, knowing that G determines d, by Theorem la. 
Similarly one can test whether cx, for some c 0, has distribution F{y, e\ 
by forming w„ = Xa/xn, a = 1, ■ , n — 1, or by expressing (xi, ■ , a;„) 

in spherical coordinates and considering the angular coordinates, according to 
whether both positive and negative or only positive values of c are to be allowed. 

In the same way one can test the hypothesis 0 = under the assumption 

that c(x — a) has distribution F(y, 9) by forming = — - - , a = 1, . . , 

Xn-i — Xn 

n - 2, or by expressing (xi — x„, ■ , x„-i — x„) in spherical coordinates and 
considering the angular coordinates. 

Theorem IVa may be applied to analogous problems, in which the hypothesis 
0 = 0° is to be tested under the assumption that y = u — 2a,a:, has distribution 
F(y, 0) for fixed values of the Xi, with the o, unknown. In such problems 
there exist linear combinations of the observed values of y which are independent 
of the a ,, By Theorem IVa, under certain conditions the joint distribution of 
these linear combinations determines the original distribution of y, without 
regard to the a, 

In applying some of the preceding results we must verify in certain cases that 
the zeros of j e*'* dF(x) are nowhere dense, for a certain distribution function. 


By a change of variable the condition of Theorem Ib can be stated in this form, 
moreover if F (x) satisfies this condition it is evident that it satisfies the condi¬ 
tion of Theorem la. A sufficient condition applicable to a considerable class 
of cases has been obtained by Levinson [4]; if f{x) is 0(e~*'*') as a; —> where 


6{x) is monotone and ^ dx diverges to oo, then J e’‘y(x) dx cannot vanish 


on an interval without vanishing identically. It is evident that it is likewise 
sufficient if the corresponding condition holds as a; —> — instead of . In 
particular, if there exists A such that/(a:) = 0 for i > A (or for a: < A) it is a 

consequence of the Levinson result that J e'‘y(x) dx has no intervals of zeros. 

It can be established easily that if f(x) is majorized by | x e > 0, in the 

neighborhood of the origin, then f dx has no intervals of zeros. 


As a simple example consider the rectangular distribution on (0, 1). Let 
(a; — a)/r have this distribution with o unknown, r > 0, and suppose that we 
are interested only in r. Given a sample Xj, ■ , x„ form the functions y^ = 
{x^ - Xn)/r, a = 1, ■ ■. ,n - 1. Set = max (y„, 0), yL = min (y*, 0). 
Then it can be shown that yi, • ■ , y„_i have probability density (1 — yjw + yi) 
in the region -1 < y« < I, yu ~ Vl < 1, zero elsewhere. V' = — 2 /t is 

of course the quotient of the sample range by r. It can be shown that ^ has 
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density n{n — 1)(1 - d\f/. Theoiem la makes it possible to base any 

tests not involving a on the distribution of the , since if the have the 
stated distribution then {x — a)/i‘ for some a must have the rectangular dis¬ 
tribution. 

Similarly, suppose y = [x — a) jr has the distribution ?/ > 0, for some 
0 , r. Then w„ = — -a = 1, 2, • • w - 1, have distribution density 


1 g where Wt, = rain (0, «„). Again, the latter distribution may be 

n 

used to estimate r. 

Let us examine the distributions of functions of the type considered, in the 
case of normality. Assume that Xi, • , a:„ are a sample of n observations 
from a normal distribution with unit variance and unknown mean. The 
variables = Sa — a:i, a = 2, ■ ,n, have a joint normal distribution with 

zero means and matrix of variances and covariances || A*' || = || 1 + 5*, 1|. 
Then Theorem la shows that if {ya\ have this joint distribution then x is nor¬ 
mally distributed with unit variance. Note that Xn~i = s SCa:* — ^)^ 

If we had x = x'/a, then S(xa — x'Y = giving the estimate 


- x'f for 


There are, of course, many ways m which the matrix || A,, || may be trans¬ 
formed into a diagonal matrix in order to obtam a new set of independently 
distributed variates; one convenient set is the set Vi 2/2; VI (y> ~ ^V^)i '' > 

i / -L_ V ) In terms of the original I’s we have Vi (®z ~ ^i) 

V n \ n - 1 «-2 /_ ^ 

VI (*3 ~ 1(2:1 + 2:2)), A/- ^ (xn-^ S2;„); these functions of the 

y n \ n — 1 a-i / 

data are independently distributed according to the normal distribution with 
zero mean and unit variance, 

Similarly, in the case of a sample Xi, ■ ■ • , x„ from a normal distribution with 
zero mean and unknown variance, there exists a set of n — 1 functions with 
distributions independent of the variance. A convenient set of functions is 
the set 


J _ VmKm+l. 
" /-S—' 

VS’‘ 


m = 1, • • ■ — 1. 


It is known (see Bartlett [1]) that the variables are independently distributed 
according to student /-distributions with m degrees of freedom respectively. 
The set t„ determines the set of angular coordinates obtamed by expressing 
Xi, ■ ■ ■ , Xn in spherical coordinates, hence we can conclude, conversely, that if 
{/ml have this joint distribution then x is normal with mean zero. 
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Finally we can eliminate both mean and variance Suppose Xi, ■ 
sample from some normal distribution. The variables 


llm “ 



m = 1,2, 


Xn are a 


,J2 - 1, 


are normal and independent with mean zero and some variance. Then we have 
the set 




1 ^ \ 

4 




independently distributed according to i-distributions with r degrees of freedom 
respectively. It may be convenient for computational purposes to make use of 
the identity 



We then have 



r+1 

S 

i<=i 


(.Xj ^(r.|.l)) . 


<; = 


/r+l - • 1 

y £ (*. - 

r 


r = 1, • • • , n — 2. 


Now, by Theorem IIIc, we know that if the set {} has this specified distribution 
then X must be distributed accordmg to some normal distribution. The set 
{t'r} may be used to test the goodness of fit of the observations to normality, 
by first adjusting the set to a standard basis of comparison, i e., by con¬ 
sidering Fr(tf), where Fr is the corresponding cumulative distribution function 
and then applying, for example, a x* goodness of fit test to these n — 2 quanti- 
tities, with respect to the rectangular distribution on (0, 1). 
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THE SELECTION OF VARIATES FOR USE IN PREDICTION WITH 
SOME COMMENTS ON THE GENERAL PROBLEM OF 
NUISANCE PARAMETERS 

By Haeold Hotelling 

1. Maximum Correlation as a Test. For predicting or estimating h particular 
variate y there is frequently available an embarrassingly large number of other 
variates having some correlation with y. For example, in fitting demand 
functions by means of economic time series, the number of series of observations 
having some relation to the demand which is sought to be estimated is apt to be 
very large, whereas the number of good independent observations on each is 
quite small. The proper coefficients in the regression equation must ordmarily 
be determined from the observations, and must not exceed in number the ob¬ 
servations on each variate Furthermore, in order to have a measure of error 
that will make it possible to distmguish real effects from those due to chance, 
it is necessary that the number of predictors^ shall be enough less than the 
number of observations on each variate so that the residual chance variance 
can be determined with an appropriate degree of accuracy. It is desirable to 
select a set of predictors yielding estimates of maximum but determinable ac¬ 
curacy, and at the same time to avoid the fallacies of selection among numerous 
results of that one which appears most significant and treating it as if it were 
the only one examined. 

Considerations other than maximum and determinate accuracy are of prac¬ 
tical importance. The labor of calculation by the method of least squares 
becomes a serious obstacle to the use of the theoretically optimum set of vari¬ 
ates when these are very numerous, though the rapid current development of 
mechanical and electrical devices suitable for these computations offers a hope 
that the limits now set in practice in this way will soon be considerably increased. 
Furthermore, predictions or estimates must, as in speculative business oi in 
military activity, be made from moment to moment, often in a rough manner 
by persons incapable of or averse to using complex formulae, and in such activi¬ 
ties frequent revisions of the regression equations must be made to accord with 
altered conditions. Also, in temporal predictions, the time of availability of 

11 use this term for what are often called the independent variates in a regression 
equation, since these ordinarily are not really independent in the probability sense. Simi¬ 
larly I shall call the "dependent" variate the ■predicland By •prediclion I mean merely the 
use of I'cgi'csaion equations to estimate some unknown variate by means of the values of 
related variates, without any necessary connotation of temporal order, though the most 
interesting applications seem for the most pait to be those in which we pass from a knowl¬ 
edge of the past to an estimate of the futiiic, 
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the values of the predictois is important, since an early prediction (e g. of the 
size of a harvest) is more valuable than a later one of the same accuracy, 

If we make the usual assumption^ that the probability distribution of y is, 
for every set of values of the predictors, normal with a fixed variance and an 
expectation that is a linear function of the predictors, we shall wish to minimize 
o' subject to appropriate limitations, and this amounts to the same thing as 
maximizing the multiple correlation p of ^ with the predictors, since 1 — is 
the ratio of to the total variance of y, which is the same for all sets of predictors. 
The estimates s and of cr and p obtained from the available sample are of 
course a different matter. But it is clear that the value of R provides a suitable 
criterion of choice under the following conditions Wc are called upon to choose 
one among two or more sets, each consisting of a fixed number of predictors; 
for each predictor we have a known value corresponding to each of the values 
1 / 1 • , 2/v observed for the predictand; and there is no basis for preferring one 
of these sets to another either in theory, in observations extraneous to those just 
specified, or in cost or time of availability In particular, if just one predictor is 
to be used, that having the highest sample correlation with the predictand should 
under these conditions be the one adopted. But in making such a choice a test 
of its accuracy is required, to take account of the po,ssibility that the wrong 
choice has been made because of chance fluctuations in the sample correlation 
coefficients 

There are innumerable economic variates available for prediction of 
business conditions, and most of these are highly correlated with each other. 
The selection of one business index instead of another for a particu¬ 
lar purpose will involve the question which has exhibited the higher correlation 
with the quantity to be predicted, and consequently the question of the definite¬ 
ness with which the difference between the calculated correlations can be 
regarded as significant. 

Our problem evidently has a bearing on governmental policy in selecting 
among the numerous series of data those whose continuation will be most valu¬ 
able. The high cost of assembling these statistics dictates a careful selection of 
a limited number of series having little correlation with each others' current 
values, but with correlations as great as possible with those things whose predic¬ 
tion or estimation is most important. 

2. The Choice of one Predictor with Two Available. Let us take first the 
simplest case, which may be illustrated by a Michigan State College problem of 

* We shall not here go into the question of the applicability of these standard assump¬ 
tions to time series otherwise than to note that some transformations of observations 
ordered in time are usually necessary and sufficient to obtain quantities satisfying the 
assumptions so closely that deviations from them cannot be detected. Such transforma¬ 
tions include replacing a variate by its logarithm, and eliminating trend and seasonal 
variations by least squares. In view of the satisfactory ndjiiatod observations found 
empirically by these and similar methods, the usual objections to studying time scries by 
exact methods seem much exaggerated 
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which Dr. W D. Eaten has told me. The ultimate weight of a mature ox is 
estimated by means of his length at an early age. The question has been raised, 
however, whether a more accurate prediction might not be made by means of 
the calf’s girth at his heart. Records were at hand of 13 oxen showing their 
lengths and girths as calves and also their weights when mature. A regression 
equation involving both length and girth would presumably give greater accuracy 
than' either variate alone; but it appears that those who make the estimates 
desire a simple formula involving only one variate. Suppose, then, that in such 
a sample the correlation of weight with length is ri = ,7, that the correlation 
of weight with girth is ra = .5, and that the correlation of girth with length is 
ro — 4. Is the difference n — ra = .2 sufficiently great in relation to its sampling 
errors to warrant the inference that girth is really a better predictor than 
length, or must the question be left in abeyance until more observations can be 
accumulated? 

A straightforward procedure which would have been used with little question 
before the advent of modern exact methods is to calculate the asymptotic ap¬ 
proximation to the standard error of ri — by the differential method, assuming 
the three variates to have the trivariate normal distribution, and to regard the 
difference of the correlations as significant if it exceeds a multiple of this standard 
error determined by the tables of the normal distribution. The calculation of 
the asymptotic approximation ffn-rj may be carried out in the following manner. 
Let pi, P 2 , and po be the population values of n, n, and ro respectively. Then 
if cr„ denote the population covariance of x, and x,(i, j = 0, l,i 2), wo have 

voi 

Pi = ^ /- > 

V ffm<ru 

With similar formulae for pi and po. Likewise the sample estimates of these 
parameters are given by such expressions as 


_ Soi 
\/ soosu 


Taking the logarithm of this last expression, expanding about the population 
values, denotmg by the operator 8 the deviation of sample from population values 
of the covariances, and the resultant deviation in n, and dropping terms of 
order higher than the first, we have: 


In the same way 


. / Ssoi 

Sr, = p, I — 
\ffoi 


_ 8soo _ S5n\ 

2<too 2o’ii/ 

_ / 6so2 _ 8soo _ 8s22 a 

r 2 ^\(ro 2 2(roo 2(722/ 


The asymptotic value of the sampling covariance is obtained by multiplying 
these two expressions together and taking the expectation. The sampling co¬ 
variance of two estimates of covariance of the usual kind (sum of products 
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divided by number of degrees of freedom) in the same samplsj having n degrees 
of freedom (which ordinarily means that, there are n + 1 individuals in the 
sample and that the means are eliminated), is given exactly by the formula* 

]El(^8Si]SSkm) ~ O’lmtT'jAj)//!, 


in which the subscripts may have any values, equal or unequal. When this 
formula is applied to each of the nme terms of the product and the results are 
expressed m terms of the correlations p,, there results the asymptotic expression 
for the covariance given by 

nE(8r\8r^ = ipmipl + P2 + Po — 1) + Po(l — pi — pi). 

This method provides also one of the denvations of the familiar formula which 
may be written 

ml, = nE(8nY = (1 - p?)^ = (1 - pi)'. 

The variance of the difference of n and is the sum of their variances minus 
twice their covariance. Hence 


= (1 — Pi)' + (1 — p|)' — PlP2(pi + P2 + Po — 1) + 2po(p| + pi ~ 1). 


We are testmg the hypothesis that pi = pz . If we put a common value p 
for them in the last expression and simplify, we obtain for the standard error 
of the difference, _ 


Vri-fj 



(1 - po)(2 - 3p^ + pop') 
n 


The second factor in parentheses is always positive because of the inequalities 
limiting the correlations among three variates. 

This formula contains two unknown parameters, p and po. The classical 
procedure would be substitute n , rt and ro respectively for pi, pz, and po in the 
previous formula, and use the resulting standard error expression as if the ratio 
to it of Ti — rz were normally distributed A first modification, more in line with 
modern ideas, would be to use some kind of average of n and as an estimate 
of both Pi and pz, since the null hypothesis tested is that these are equal. But 
whatever sample estimates we substitute for p and po, the formula remains un¬ 
satisfactory, since no suitable limits of error are available. If instead of the 
standard error wc were to work out the exact distribution of n — rj we should 
still not be free from the difficulty. This exact distribution clearly involves 
both p and po, since its variance does so. Neither can we escape from the 
trouble by using some function s = /(r), such as the inverse hyperbolic tangent 
suggested by R. A. Fisher, and considering the standard error of ?! — gj = 


’ I have given a derivation of this formula from the olmractenstio function of the multi¬ 
variate normal distiibution [1] Numerous special cases appear in earlier literature, The 
derivation above is a simplification and impiovement of several versions, appearing in 
the various early writings of Karl Pearson. 
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fin) - fin), for this standard error will have as the first term in its expansion 
in a series of powers of n ^ simply the product of the expression above for 
urj-ra hy/'(p). and this must clearly involve both po and p. 

3. Nuisance Parameters. This is not by any means the only statistical prob¬ 
lem m which unknown and undesired parameters enter into the distribution of 
the statistic which we should naturally use to test a hypothesis. Indeed, the 
early investigation which was perhaps most influential in setting the whole tone 
of modern statistical research was that [2] in which W. C Gosset (“Student”) 
arrived at the exact distribution of the ratio of a deviation in the mean to the 
estimated standard error. The previous practice (which unfortunately survives 
today in some quarters, and is even taught to students without explaming its 
approximate character) was to neglect the sampling errors in the estimate of 
the unknown variance cr^ and to treat the ratio as normally distributed with 
unit variance. The rigorous derivation by Fisher [3] of the Student distribution 
makes clear the manner in which the nuisance parameter may in this, and m 
some other, problems be eradicated from the distribution through integration, 
after altering the original statistic (the deviation in the mean) by dividing it 
by another statistic. The new statistic, the Student ratio, vanishes whenever 
the old statistic, the deviation in the mean, does so, and the same hypothesis 
is tested by both. This then is one way to get rid of a nuisance parameter: 
when you have a statistic estimating a parameter whose vanishing is in question, 
but whose distribution involves another parameter, alter the statistic by multi- 
plymg or dividing by another statistic in such a way that the new function 
vanishes whenever the old one does so; and do this in such a way that the new 
distribution will be independent of the nuisance parameter. Unhappily, this 
method has been applied successfully only in particular cases, and no way to 
use it in the problem at hand has been found. 

A second method is that of transformation employed by Fisher in dealing with 
such problems as testing the significance of the difference between the correla¬ 
tion coefficients in independent samples between the same two variates. The 
need for the transformation in this case is occasioned by the presence in the 
distribution of the difference of the sample correlations of the unknown true 
value, which is not directly relevant to the comparison. Wc have seen that 
this method also fails to solve our problem. 

A third method of dealing with nuisance parameters is the use of fiducial 
probability by R. A. Fisher [4] and by Daisy M Starkey [5] in testing the 
significance of the difference between the means of two samples when the 
variances may be unequal. Criticisms of these applications of fiducial probability 
have been made by M S. Bartlett [6] and B. L Welch [7], and the field of 
applicability of such methods is still in need of elucidation. 

Some findings of J. Neyman [8] having a bearing on the general nuisance 
parameter problem should also be noted. 

The only other class of methods for dealing with nuisance parameters of which 
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I am aware involves the comparison of the particular sample obtained, not with 
the whole population of samples with which a comparison might be made if we 
knew the value of the troublesome parameter, but with a sub-population selected 
with reference to the sample in such a way that the distribution, in this sub- 
population, of the statistic used does not involve any unknown parameter. An 
example is the testing of significance of a regression coefficient Thus if we 
suppose that a sample of values of x and y is drawn from a bivariate normal 
population, and calculate the regression coefficient b of y on a: m the sample, 
the distribution of b involves not only the population value /3; but also the ratio 
a of the variances in the population Since this second parameter is unknown, 
and can only be estimated from the sample, it is not possible to use the distribu¬ 
tion of h in the whole population directly to test the significance of b — 
What we do is to find the place of this difference, not in the whole population 
of values in which both x and y are drawn at random, but in a sub-population 
for which the values of x are the same as in our sample. We may alternatively 
say that we limit the sub-population only to that for which the sum of the 
squares of the deviations of the values of x from their mean is the same as in 
our sample; the results are the same. The distribution in this sub-population 
of the ratio of b — /3 to its estimated standard error is of the Student form, with 
no unknown parameters, and on this basis it is possible to make exact and 
satisfactory tests and to set up fiducial limits for b. Another example is that 
of contingency tables. The practice now accepted (after a controversy) for 
testing independence of two modes of classification, such as classification 
of persons according as they have or have not been vaccinated, and again ac¬ 
cording as they live through an epidemic or die, is to compare the observed 
contingency table, not with all possible contingency tables of the same numbers 
of rows and columns, but only with the possible contingency tables having 
exactly the same marginal totals as the observed table. 

4. An Exact Solution, We shall solve the problem of the significance of the 
difference of n and ri with the understanding that the meaning of significance 
is to be interpreted by reference to the sub-population of possible samples for 
which the predictors ii and Xi have the same set of values as those observed in 
the particular sample available. This procedure, besides yielding an exact 
distribution without unknown parameters, has the advantage of relaxing the 
stringency of the requirement of a trivariate normal distribution. We now make 
only the assumptions customary in the method of least squares, that the pre- 
dictaud y has the univariate normal distribution for each set of values of xi and 
Xt , independently for the different sets, with a common variance o-*, and with 
the expectation of y for a fixed pair of values of the predictors a linear function 
of these predictors. No assumption is involved regarding the distribution of 
the predictors, since we regard them as fixed in all the samples with which we 
compare our particular sample. The advantages of exactness and of freedom 
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from the somewhat special trivariate normal assumption are attained at the 
expense of sacrificing the precise applicability of the results to other sets of 
values of the predictors. 

Since the correlational properties are unchanged by additive and multiplica¬ 
tive constants, we may suppose that 

(1) Sxi^0 = Sx,, &? = 1 = Sxl, 

where S stands for summation over a sample of N individuals. The notation 
may be made more explicit by the adjunction of an additional subscript a, vary¬ 
ing from 1 to N, to denote the individual member of the sample, so'that instead 
of Sxi, for example, we might write The omission of this additional 

subscript is convenient and will usually leave no ambiguity when we deal with 
sums, but it will be convenient to retain it in connection with individual values 
The correlation rp of Xi with Xi in all those samples we shall consider is, by (1) 

T-q = SXiX2 . 


Now consider the new quantities 

®“"V2 Cr^)' V2(H-ro)- 

Evidently, from (1) and (2), 

(3) Sx' = 0 = Sx", Sx'^ = 1 = Sx"\ Sx'x" = 0. 

Since the mean value E{ya) is a linear function of Xi^ and X2„ , y, may, upon 
subtracting a constant from all these expectations, be written 

(4) ya = l3iXla + P2X2a + A„ , 

where Ai, ■ • , A ^ are normally and independently distributed with variances 
all equal to o-“ and expectations zero. The assumption that Xi and x^ are equally 
correlated with y in the population leads to the conclusion that /3i = ; and 

putting /3 = ;3 i\/2(1 + n), we then have from (4) and (2): 

(5) ya = ^x” + A„ . 

Consequently, by (3) 

Sx'y = Sx'aya = pSx'x" + Sx'A = Sx'A] 

and this function has a normal distribution with zero mean and vanance cr^ 
If in the sample wc work out a regression equation 

r = a -h b'x' + h"x”, 

the normal equations for determining b' and b" must by (3) take the simple forms 
a = y, b' = Sx'y, b" = Sx"y. 
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From the general theory of least squares it is known that the sum of squares 
of residuals is 

Sv^ = S(y - Yf = 8y^ - ySy - (Sx'yf - S{x''yy, 

and that Sv'^/a'^ has the distribution with n = N — 3 degrees of freedom, 
independently both of Sx'y and of Sx''y. From these facts it follows that 

(6) « = Sx'y ^i 

has the Student distribution with n degrees of freedom. Since in accordance 
•With the foregoing definitions and (1) we have 


Ss'y - (r. - 

and since also it is known that 


where 


(6) may be written 

(7) 


Sv^ = S(y - yf 



Ti 

1 

n 


n 

n , 
1 




The probability of a greater value of | < ] is given by tables of the Student 
distribution with n = N — 3. If this probability is sufficiently small (which 
conventionally means less than 05, or sometimes .01) wc have a corresponding 
degree of confidence that the variate chosen because of a higher correlation in 
the sample has actually a higher correlation than the other m the population, 


6 . The Selection of One Variate from Among Three or More. Suppose that 
we are to choose one of the variates xi, • ■ , in order to predict y. (p < N — 1) 
We choose the one having highest correlation, and wonder how much confidence 
to place in this choice. We shall now determine the distribution of a function 
suitable for testing the hypothesis that there is no real difference between any 
pair of the correlations of Xi, • • ■ ,Xp with y. Again we shall assume the values 
of these predictors fixed, and look for the place of our particular sample among 
all samples having these values, with only y free to vary normally by chance. 

Let = S{Xi — S.)(x, — X,), and let c»,' be the cofactor of a,, in the deter¬ 
minant a of these quantities, divided by 0 . Then 

_ _ jl ifj = *, 

dxfCik — Ojk — I 

[0 if j ^ k. 


E 


( 8 ) 
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Here S stands for summation from 1 to p. Let 


(9) 

( 10 ) 

( 11 ) 

From (9) it follows that 


Wi = 


Sc.j 


EEc./ 


li S(^Xi x)y, 

I = 


( 12 ) 


Sty, = 1. 


From the hypothesis that y is in the population equally correlated with all the 
Xi it follows that h , • • , Ip have equal expectations, which we may denote by 
and from (11) and (12) it follows that also E(l) = X. Obviously 

(13) E(k - X)il, - X) = o-V,, 


where (r“ is the variance of those values of y correspondmg to a fixed set of 
values of the x’s. From (11), (13) and (9) we obtain 


(14) 


E(L - = 


S2c„ 


Since the I. are linear functions of the y’s, they have the multivariate normal 
distribution. From the theory of this distribution and the values (13) of the 
covariances it follows that the distribution has the form 

dk ■■ dip, 

where a is the determinant of the a^j’s, and 

T = SSc.,(L - X)(J, - X). 

We may introduce linear functions l[ , , !(, of Zj — X, • , Zp — X such that 

T = l[^+ + and such that Ip = (Z - X)"S2c.,. Now 

has the x distribution with p - 1 degrees of freedom. The numerator of this 
expression equals 

T - Ip ^ 2Sc„(Z. - X)(Z, - X) - (Z - X)'2Sc<, 

' = 22 Ci 3 Z.Z, - Z*22c„ 

= 22cij(Zi Z)(Zj Z). 


The penultimate form shows that this function is independent of X; the last, 
as a positive definite form in the deviations of the Z’s from their weightedmean, 
shows that sufficiently large values of the expression will reveal with definiteness 
the inequality of the predicting powers of the p variates when this exists. 
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It is well known that the regression coefficients of y upon the set of variates 
xi, - , Xp are completely independent of the' sum of squares Sv^ of residuals 
from the regression equation Since the Z’s are linear functions of these regres¬ 
sion coefficients, (namely the linear functions appearing in the normal equa¬ 
tions), they also are independent of S/. Hence, if we put 

2 ' 1 22/Cl, 

—i- > 

p -1 

2 _ Sv‘‘ 
iV-p-1’ 

the ratio F = Si/si will, in case of equality of the correlations of the' various 
aj’s with y, have the variance ratio distribution with = p — 1 and 712 = N ~ 
p — 1 degrees of freedom. When p = 2 this test reduces exactly to (7), as it 
should, and F = 

In the numerical application of this method, the regression coefficients b, 
of y on Xi, ‘ Xp should first be worked out by the inverse matrix method. 
The right-hand members of the normal equations are Zi, • , Zp , the coefficients 

in these equations are the a„ , and the calculation of si is simplified with the 
help of the identity 

6 . Selection of Additional Variates When Some Have Been Chosen. Sup¬ 
pose now that q predictors have been included definitely in the regression equa¬ 
tion, and that one more is to be selected for inclusion among p additional pre¬ 
dictors that are available. The criterion now is that that one should be chosen 
tentatively which has the highest partial correlation with the prediotand, elimi¬ 
nating those already definitely chosen; but the confidence to be placed in the 
choice is to be judged by an adaptation of the criterion of the preceding section. 
It is only necessary to consider the o,-,, I,, c„ and bv (i, j = 1, ■ • ■ , p) as cal¬ 
culated from the new predictors and the deviations of y from the regression 
equation on the predictors already adopted. Formulae may easily be derived 
for the values of these quantities in terms of those already found,and the sums 
of products, so as to simplify the calculations. will now stand for the sum 
of squares of residuals from the regression equation involving all the f q 
predictors. It is to be divided byJV — p — g — Ito obtain s?. The numbers 
of degrees of freedom with respect to which F is to be judged are now iii = p — 1 
and n 2 = N — p — q — 1. When p = 2 this test, like that of the preceding 
section, reduces to the use of the t-distribution of (7), wnth n — W — g ~ 3, 
and the correlations standing for partial correlations eliminating the predictors 
already definitely chosen. 

A special instance in which this procedure is applicable is in economic time 
series, in which time, in the form of orthogonal polynomials, must ordinarily be 
"partialled out” in order that tests of significance may be sound. 
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7. Further Problems. It is natural to ask whether the foregoing work can be 
extended to examine the soundness of the selection, on the basis of a greater 
rnultiple correlation, of a particular set of two or more variates, chosen from 
among several such sets. The simplest such problem that goes beyond what 
has been done above deals with two seta, each of two predictors, having in a 
sample multiple correlations R and R' with the predictand. The question is 
whether the difference R — R' is significant. 

Suppose that, in the interests of simplicity and the hope of attaining a solu¬ 
tion satisfactorily free from unknown parameters, we assume as before that the 
predictors have a fixed set of values, the same in all samples Since multiple 
correlations are invariant under linear transformations of predictors, we may 
without loss of generality assume that the predictors in each set are mutually 
uncorrelated and have sums of squares equal to unity Indeed, we may go 
somewhat further in standardizing the sets of values to which consideration can 
be confined without loss of generality, with the help of some ideas introduced 
in the paper [1], In the terminology of that paper, the variates in each set may 
be considered canonical with respect to the relationship between the sets. This 
means that linear functions zi and xj of the two variates in one set, and linear 
functions xi and Xj of those in the other set, can be chosen so as to satisfy not 
only the conditions 

Sxi = Sx 2 = (Sx( = Sx 2 = 0 

(15) Sxl = Sxl = Sx[^ = == 1 

SxiXi = 0 = iSxixJ, 

but also the further conditions 

(16) Sxixi = 0 = Sx2x'i . 

This means that, for all the purposes in view, the two sets of predictors can be 
characterized as to their mutual relationships by the values of the remaining 
two sums of products, namely 

Cl = »Sxi2;(, d = SxiXi . 

In view of the conditions assumed earlier, Ci and Cj are what have been called 
the canonical correlations between the two sets. 

To the sets thus standardized, the predictand y is related in a mannei expressed 
by the population regression coefficients |9i and Pi of y on the first set, and Pi 
and Pi on the second If we take y as having unit variance in the population, 
the squared multiple correlation coefficients in the two cases will be 

= Pl + Pl, p'^ = Pi + P'l- 

The hypothesis to be tested is that p = p'. If 5i, 5i!, , ba denote the sample 

estimates of the regression coefficients, the statistic appropriate for the test 
would appear necessarily to be proportional to 

w = j(bi -|- bj — b/ — bj^). 
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The sample regression coefficients are normally distributed, with populatiou 
correlations equal to the sample correlations among the con esponding predictors. 
The variance of each is Thus their jomt distribution may be written down 
at once, in a rather simple form in view of (15) and (16). From this it is pos¬ 
sible to determine directly the characteristic function M{t) = Ee'“ of w. If 
we write K(t) = log M(t) we obtain- 

2K(t) = s}(^? - 2c, + (pf - ^f)ti ii - (i - cyr^ 

- Slog {1 - (1 - cy. 

Here the summations are with respect to j over the values 1 and 2. If each set 
of predictors had had s members, the same result would hold for K{t) except 
that the summations with respect to j would then extend from 1 to s. 

This IS a very disappointing result because it contains so many parameters 
The distribution of w must contain the same parameters as its characteristic 
function. All the four parameters , fi', appear in the expression above, though 
their effective number is reduced to three by the condition that the two sums 
of squares shall be equal which constitutes the hypothesis under test. The 
distribution of w thus contains at least three unknown parameters besides c. 

The estimate of variance obtained from the residuals from the grand re¬ 
gression equation of y on Xi, Xi , xi , and x'i is independent of w. Its distribu¬ 
tion is of the usual form and involves a parameter, the population variance, 
which is a function of and . We could therefore pass by a single 

integration from the distribution of w to that of the statistic w/s^, which vanishes 
with w, and which on this account, and on grounds of physical dimensionality, 
might be considered appropriate to test the hypothesis that p = p'. The ques¬ 
tion may be raised whether the distribution of this ratio might not be free from 
parameters. The answer unfortunately is in the negative, as appears from an 
exammation. of the characteristic function of the ratio. Even in the simplified 
case in which all the c, are equal, a troublesome parameter persists in the 
distribution. 

Thus we meet again the problem of nuisance parameters, and this time no 
escape is visible. Perhaps some such artifice as those enumerated in paragraph 
3 (for example, some further limitation of the sub-population within which we 
should seek the place of our particular sample) is capable of yielding an exact, 
or “studentized” distribution, but this has not yet been found The problem 
is of considerable interest, not only because of its practical importance, but 
because of its suggestiveness in connection with general theory. 

Numerous other problems having both practical importance and general 
theoretical interest are associated with the selection of predictors. For example, 
we have not dealt at all with the problem of the number of predictors that 
should be used when maximum accuracy in prediction, or in evaluation of the 
regression coefficients, is the sole criterion. A particular case is the determina¬ 
tion of the degree of the regression polynomial which .should be fitted to obtain 
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maximum accuracy, for example of the number of orthogonal polynomials in 
fitting a trend Such customary criteria as minimizing the estimated variance 
of deviations, in which the sum of squares which is the numerator and the 
number of degrees of freedom which is the denominator both diminis h to zero 
as the number of variates is increased, do not rest upon any satisfactory general 
theory. 

Another related set of problems is concerned with variates more numerous 
than the observations on each. It is clear that there is real information in¬ 
herent in data of this kind, but existing theory and methods, including those of 
the present paper, are not adequate to utilize it in a thoroughly efficient manner. 
A recent paper of P. L. Hsu [9] is unique in not excluding the case in which the 
variates outnumber the observations. 

8 . Summary. A criterion has been obtained for judging the definiteness of 
the selection of a particular variate, from among several available for prediction, 
on the basis of its having the maximum sample correlation with the predictand. 
A variation of this criterion is applied in paragraph 6 to the problem of extending 
the list of variates to be used in a regression formula 
Some of the problems of “nuisance parameters” which affect general theory 
are illustrated in this problem. Some outstanding unsolved problems related 
to these questions are discussed in paragraph 7. 
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THE FITTING OF STKAIGHT LINES IF BOTH VARIABLES ARE 
SUBJECT TO ERROR 

By Abraham Wald 

1, Introduction. The problem of fitting straight lines if both variables x 
and y are subject to error, has been treated by many authors. If we have iV > 2 
observed points (a:,, i/,) (* = !,■ , N), the usually employed method of least 

squares for determining the coefficients a, h, of the straight line y - ax -{- h 
is that of choosing values of a and b which minimize the sum of the squares of 
the residuals of the y’a, i e. S(aa;, + h ~ V^f is a minimum, It is well knotvn 
that treating y as an independent variable and minimizing the sum of the 
squares of the residuals of the x’s, we get a different straight line as best fit, It 
has been pointed out^ that if both variables are subject to error there is no 
reason to prefer one of the regression lines described above to the other For 
obtaining the "best fit/' which is not necessarily equal to one of the two lines 
mentioned, new criteria have to be found. This problem was treated by R. J. 
Adcock as early as 1877.’“ 

He defines the line of best fit as the one for which the sum of the squares of 
the normal deviates of the N observed points from the line becomes a minimum. 
(Another early attempt to solve this problem by minimizing the sum of squares 
of the normal deviates was made by Karl Pearson,“) 

Many objections can be raised against this method First, there is no justifi¬ 
cation for minimizing the sum of the squares of the normal deviates, and not 
the deviations in some other direction. Second, the straight line obtained by 
that method is not invariant under transformation of the coordinate system. 
It is clear that a satisfactory method should give results which do not depend 
on the choice of a particular coordinate system. This point has been empha¬ 
sized by C. F. Roos, He gives* a good summary of the different methods and 
then proposes a general formula for fitting lines (and planes in case of more than 
two variables) which do not depend on the choice of the coordinate system. 


1 See for instance Henry Schultz' "The Statistical Law of Demand,"' Jowr. of Political 
Economy, Vol. 33, Dec. (1925) 

** Analyst, Vol. IV, p 183 and Vol, V, p. 63. 

^ “On Lines and Planes of Closest Fit to Systems of Points m Space" Phil. Mag. 6th 
Ser Vol, II (1901) 

* "A General Invariant Criterion of Fit for Lines and Planes where all Variates are 
Subject to Error," Metron, February 1937. See also Oppenheim and Roos Bulletin of the 
American Mathematical Society, Vol. 34 (1928), pp. 140-141, 
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Roos’ formula includes many previous solutions^ as special cases. H. E, Jones* 
gives an interesting geometric interpretation of Roos’ general formula. 

It IS a common feature of Roos’ general formula and of all other methods 
proposed in recent years that the fitted straight line cannot be determined 
without a prwri assumptions (independent of the observations) regarding the 
weights of the errors in the variables x and y. That is to say, either the standard 
deviations of the errors in x and in y are involved (or at least their ratio is 
included) in the formula of the fitted straight line and there is no method given 
by which those standard deviations can be estimated by means of the observed 
values of x and y 

R. Frisch^ has developed a new general theory of linear regression analysis, 
when all variables are subject to error. His very interesting theory employs 
quite new methods and is not based on probability concepts Also on the basis 
of Frisch’s discussion it seems that there is no way of determining the “true” 
regression without a priori assumptions about the disturbing intensities. 

T. Koopmans* combined Frisch’s regression theory with the classical one in 
a new general theory based on probability concepts. Also, according to his 
theory, the regression line can be determined only if the ratio of the standard 
deviations of the errors is known 

In a recent paper R. G. D. Allen® pves a new interesting method for deter¬ 
mining the fitted straight line in case of two variables x and y. Denoting by Vt 
the standard deviation of the errors in x, by cr, the standard deviation of the 
errors in y and by p the correlation coefficimit between the errors in the two 
variables, Allen emphasizes (p. 194)® that the fitted line can be determined only 
if the values of two of the three quantities tr*, cr, , p are given a priori. 

Finally I should like to mention a paper by C Eisenhart|“ which contains 
many interesting remarks related to the subject treated here. 

In the present paper I shall deal with the case of two variables x and y in 
which the errors are uncorrelated. It will be shown that under certain con¬ 
ditions . 

(1) The fitted straight line can be determined without making o priori assump¬ 
tions (independent of the observed values x and y) regarding the standard 
deviations of the errors. 

(2) The standard deviation of the errors can be well estimated by means of 

‘ For instance also Corrado Gini's method described in his paper, "Sull' Interpolazione 
di una Retta Quando i Valon della Vaiiable Independente sono Affecti da Erron Acciden- 
talis,” Metron, Vol. I, No 3 (1921), pp 63-82, 

® ‘‘Some Geometrical Considerations in the General Theory of Fitting Lines and Planes,” 
Metron, February 1937. 

’ Statistical Confluence Analysis by Means of Complete Regression Systems, Oslo, 1934 

“ Linear Regression Analysis of Economic Time Series, Haarlem, 1937 

* “The Assumptions of Linear Regression," Economica, May 1939 
“The interpretation of certain regression methods and their use in biological and 
industrial research,” Annals of Math Slat, Vol. 10 (1930), pp, 162-186, 
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the observed values of x and y. The precision of the estimate increases with 
the number of the observations and would give the exact values if the number 
of observations were infinite. (See in this connection also condition V in 
section 3.) 

2. Formulation of the Problem. Let us begin with a precise formulation of 
the problem. We consider two sets of random variables” 

1 • • ) ; yi, •• • ,ys ■ 

Denote the expected value E{x^ of Xi by X; and the expected value E{y^ of 
2 /t by 7, (i = 1, ■ ■ ■ , X). We shall call Xi the true value of x,7, the true 
value of yi ,Xi — X, = «, the error in the f-th term of the 2 :-set, and y^ — = t;. 

the error in the i-th term of the y-set. 

The following assumptions will be made: 

I. The random variables ei, • • , e>/ each have the same distribution and they 
are uncorrelated, i e. — 0 for i ^ j. The variance of e, is finite. 

II The random variables iji, • , vk each have the same distribution and are 

uncorrelated, i.e. E(ri,ri,) = 0 for i ^ j. The variance of m is finite. 

III. The random variables e, and n, (i =: 1, • ,N;j=l, >■ ,N) are un¬ 
correlated, i.e. Eieifi,) = 0. 

IV. A single linear relation holds between the true values X and Y, that is to 
say 7, = aX, + ^ (« - 1, • , N). 

Denote by e a random variable having the same probability distribution as 
possessed by each of the random variables ei, ■ • • , ejv, and by r; a random 
variable having the same distribution as iji, • • , rjif. 

The problem to be solved can be formulated as follows: 

We know only two sets of observations: x[, • • ■ ,x'if;y'i, • • • ,yK, where x'i 
denotes the observed value of z, and ?/, denotes the observed value of y*. We 
know neither the true values Xi, • • , Xj^; 7i, • • ■ , 7y , nor the coeflScients 
a and /3 of the linear relation between them. We have to estimate by means 
of the observations x'l, ,XN;y[,- , ylr, (1) the values of a and (2) the 

standard deviation a, of e, and (3) the standard deviation tr, of y. 

Problems of this kind occur often in Economics, where we are dealing with 
time series. For example, denote by x, the price of a certain good G in the 
period U , and by i/, the quantity of G demanded in U . In each time period U 
there exists a normal price X, and a normal demand 7, which would obtain if 
the influence of some accidental disturbances could be eliminated If we have 
reason to assume that there exists between the normal price and the normal 
demand a linear relationship we have to deal with a problem of the kind de 
scribed above. 

In the following discussions we shall use the notations s, and ?/,■ also for their 


A random or stochastic variable is a real variable associated with a probability 
distribution. 
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observed values a;, and 2/> since it will be clear in which sense they are meant 
and no confusion can arise. 


3. Consistent Estimates of the Parameters a, 13, a,, ir„. For the sake of 
simplicity we assume that N is even. We consider the expression 


( 1 ) 

fli 

where m = N/2. 
( 2 ) a = 


_ (Xl + ■ ■ 

■‘ + xj - (a:™+i + ... 

' 4" Xk) _ 


N 


_ (l/l + ■' 

■ • + Vm) — (Vm+l + ■ • ' 

• 4- Vtf) 


N 

) 

As an estimate of a we shall use the expression 

02 _ (yi 

• • • + Vm) — (Vm+l 4- 

• • • 4“ 2/jv) 


fli (^1 + • • • + ajm) — + • • • + xn) 


We make the assumption 
V. The hmit inferior of 

(.3^1 + • ■ ■ + Xm) — (Xm+l + • • • + 

N 


(iV = 2, 3, • • • ad. inf. 


is ‘positive. 

We shall prove that o is a consistent estimate of a, i.e. a converges stochas¬ 
tically to a with If 00 , if the assumptions I-V hold. Denote the expected 
value of tti by di and the expected value of an by &i It is obvious that 


(3) 


(Xi -(-•••+ ^m) ~ (Xm+l + ' ■ • + Xtf) 

N 

(Fl -!-•••+ Fm) ~ (Fro+1 -h Fif) 


On account of the condition IV we have 


/ \ - ^ 

( 4 ) 0,2 = aoi, or — = a. 

di 

The variance of Oi - di is equal to and the variance of aj - dj is equal 
to <tI in. Hence ai and aj converge stochastically towards di and dj respectively. 

From that and assumption V it follows that also - converges stochastically 

Oi 

towards ^ = a. The intercept /S of the regression line will be estimated by 
Ox 

, - . , - a:i-1- • • • -f _ 1/1 + • • • + 2/Jv 

(5) h = y — ax, where x = -- and ‘y --. 

Denote by X the arithmetic mean of Xi, • • , Xx and by ? the arithmetic 
mean of Fi, ■ ■ ■ , F^ . Since y Converges stochastically towards F, x towards 
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X, and a towards a, h converges stochastically towards Y — aX. From condi¬ 
tion IV it follows that Y — aX — /3. Hence b converges stochastically 
towards 

Let us introduce the following notations: 


- = sample standard deviation of the a;-observations, 


2 = sample standard deviation of the a;-observations, 

Sy = 2 j y ~ “ sample standard deviation of the ^/-observations, 

= S - — = sample covariance between the a;-set and y-set. 

Sx, Sy and sxr denote the same expressions of the true values Xi, ■ ■ , X^ ■ 

Yi,..-,Yx. 

It is obvious that 


= 4 + 


2 I 2-V — 1 


E{sl) = 4 + a? 


2 L ^iV — 1 


( 8 ) Eisxy) = Sxr, 

where E{sl), E{sl), and -B(Siv) denote the expected values of s*, si, and 
Since F, = cxXi -f /3, we have 

(9) Sr — ocSx, 

(10) Sxr — otslc • 

From (8), (9) and (10) we get 

/HV _ E{Sx„) 


(12) Sr = aE{s^). 

If we substitute in (6) and (7) for Sx and Sr their values in (11) and (12), 
we get 


= ^Eisl) - jiV/(ilf - 1), 

<rl = [i;(sS) - «Bis^)]N/iN - 1). 


I observe that the equations (6), (7) and (8) are essentially the same as those investi¬ 
gated by E, Frisch, Statistical Confluence Analysis pp, 61-52. See also Allen’s equations 
(4) l.c. p. 194 
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Since si, , Siv converge stochastically towards their expected values and a 

converges stochastically towards a, the expressions 

(15) [s“ - - 1) 

and 

(16) [sj - as,,]N/(N - 1) 
are consistent estimates of al and tr* respectively. 


4. Confidence Interval for a. In this section, as well as in sections 5 and 6, 
only the assumptions I-IV are assumed to hold. In other words, all state¬ 
ments made in these sections are valid independently of Assumption V, except 
where the contrary is explicitly stated 
Let us introduce the following notation: 


3 _ + • • • + .T yi + ■ • • + Vm 

xi = -, yi = - 

m m 


Xi = 


Xm+l + • “ + 


- ym+l + ■ “ + Vn 


m 


m 


f\2 _ 


;) 


N 


2 (yi - + S (yj - yif 

_ *-I l-m+l 


/ 



2 (a;* - ^i)(y< - yi) + 2 - ^)(vt - Vi) 

1—1 3 "m+l 

N 


Xi, Xi, Yi , Yi, (sx)'‘, (sy)^ and Sxy denote the same functions of the true 
values Xi, ,Xk , Yi, ••• ,Yi/. The expressions si, si, and are 
slightly different from the corresponding expressions s,, , and . The 

reason for introducing these nerw expressions is that the distributions of s*, 

Sy, and Sxy are not independent of the slope a = - of the sample regression 

Ol 

line, hut sl, s'y and sly are distributed independently from a (assuming that c 
and 17 are normally distributed). The latter statement follows easily from the 

fact that according to (1) and (2) a = — -and si, si, sly are distributed 

Xi — *2 

independently of Xi, £ 2 , pi and pj. 
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In the same way as we derived (13) and (14), we get 
(130 = E(.s',Y - - 2), 


(140 


<r? = mslf - aE{sly)]N/{N - 2). 


These formulae differ from the corresponding formulae (13) and (14) only in 
the denominator of the second factor, having there N — 2 instead oi N — 1. 
This is due to the fact that the estimates Si, s„, are based on iV — 1 degrees 
of freedom whereas si , Sy and are based only on N — 2 degrees of freedom. 
From (130 a^nd (140 we get the following estimates” for cr] and o-J : 


(17) 




(sl)^ - N/(N - 2), 

oc _ 


(18) [(s')^ - as'jN/iN - 2). 
Hence we get as an estimate of o-J + the expression. 

s' = [(si)' + a (si)' - 2asiy]N/{N - 2) 

(19) 


N 


£ [(y. - ax,) - (jii - axi)]' + £ [{y, - ax,) - (^2 - a:r 2 )]' 
»*="1 


N -2\ 

Now we shall show that 

( 20 ) 


N 


{N - 2 ) 5 = 

2 , 2 2 
<r« + a (T, 


has the x'-distribution with N — 2 degrees of freedom, provided that e and 11 
are normally distributed. In fact, 


(y, - ax,) - {yi — axi) = 17. - ae, - (fji — ah) 


(i = 1 , 


and 


(y, - ax,) - (i/2 - aX2) = 77,' — at, - (fji - q;e2) (j = m + 1, 

where 


■ ,m) 


. 61 + 

£1 = - 

■ ‘ ■ +6(71 

1 

Cm+l + ■ ■ ■ 
£2 = - 

+ cy 


m 

m 


» _ ’ll + 

fjl 

• ■ • + 17m 
m ’ 

„ _ Vm+l + ■ • ' 
“2 

m 

• + 


Since the variance of 17 * — atk is equal to o-J + a^irl and since tjk — atk is un¬ 
correlated with rji ~ ati {k ^ 1) (k, I = 1, •. ■ , N), the expression (20) has the 
x'-distribution with N — 2 degrees of freedom. 


“ An “eBtimate" is usually a function of the observations not involving any unknown 
parameters. We designate here as estimates also some functions involving the parameter a, 
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Now we shall show that 

( 21 ) 


"s/N ai(a — g) 

2 I 2~2 

V + o! 


IS normally distributed with zero mean and unit variance. In fact from the 
equations (l)-(4) it follows that 

ai(a - a) = di + - Oi 


— 02 -j- 


fll — flu 


_ f}l — ifi 


Oi + 

Cl — ^2 


Since the latter expression is normally distributed (provided that 6 and v are 


normally distributed) with zero mean and variance 


+ 


2 2 
a a. 


N 


our statement 


about (21) is proved. 

Obviously (20) and (21) are independently distributed, hence -s/N — 2 times 
the ratio of (21) to the square root of (20), namely, 


( 22 ) 


I = W jy _ 2 gi(q - «•) ^ Oi(g - a)VN - 2 

VN - 2 s ' V (s')* + a*(s'f - 2as'„ 


has the Student distribution with N — 2 degrees of freedom. Denote by <0 the' 
critical value of t corresponding to a chosen probability level. The deviation 
of Cl from an assumed population value a is significant if 


ai(a — a)-\/N — 2 

V (syY + a{sif — 2q:Si„ 


The confidence interval for a can be obtained by solving the equation in a, 
(23) a?(a - af = [(s^)* + a{s'^Y - 2as(j] • 


Now we shall show that if the relation 
(24) ai > 


holds, the roots ai and at are real and a is contained in the interior of the interval 
[ai(a 2 ]. From (19) it follows that 

(Sy)” + aisxf — 2asly > 0 

for all values of a. Hence, for a = a the left hand side of (23) is smaller than 
the right hand side On account of (24) there exists a value a' > a and a 
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value a" < a such that the left hand side of (23) is greater than the right hand 
side for a = a' and a = a". Hence one ro,ot must lie between a and a' and the 
other root between a" and a Tliis proves our statement. The relation (24) 
always holds for sufficiently large N if Assumption V is fulfilled. The confi¬ 
dence interval of a is the interval [ay , aj] For very small N (24) may not hold. 

Finally I should like to remark that no essentially better estimate of the 
variance oi r\ — ae can be given than the expression s* in (19). In fact, we 
have 2N observations X\, • • , xy ;yi, ■ • ■, yw ■ For the estimation of the 
variance of — at we must eliminate the unknowns Xy, •, Xu and /J. (The 
unlcnowns 7i, ■ ■ ■ , 7;? are determined by the relations 7^ = aX, + and a is 
involved in the expression whose variance is to be determined.) Hence we have 
at most N — \ degrees of freedom and the estimate in (19) is based on — 2 
degrees of freedom. 

6. Confidence Interval for /3 if a is Given, In this case the best estimate of 
is given by the expression: 


7 - _ 1 - iSi + • • • - yi + ■ • ’ + i/AT 

Oa = y — otx where x = ---and y — -- 

We have 

6a - /3 = (^ - Y) - aix - S) = fj - ae 
where ' 

-- tt -- N -• 


VN (6a - 

.» / 2 I 2 2 

V (T, -f- « V, 

is normally distributed with zero mean and unit variance It is obvio us that 
the expressions (20) and (25) are mdependently distributed. Hence ^/N — 2 
times the ratio of (25) to the square root of (20), i.e. 

/. = ./aTITo. (6, - P) 

Vn^s V(4y + « - 2a4 

has the Student distribution with N — 2 degrees of freedom. Denoting by to 
the critical value of t according to the chosen probability level, the confidence 
interval for ^ is given by the interval: 

u (^y + a^Cs^y — 2as^ , ^ (suy + a*(si)“ — 2aSi„ ^ 

L VN-2 VN-2 j 


Hence, 

(25) 
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6 . Confidence Region for a and /3 Jointly. In most practical cases we want to 
know confidence limits for a and /3 jointly. A pair of values a, /3 can be repre¬ 
sented in the plane by the point with the coordinates a, 0 A region R of this 
plane is called confidence region of the true point (a, / 3 ) corresponding to the 
probability level P if the following two conditions are fulfilled. 

( 1 ) The region P is a function of the observations Xi, ■ ■ , 

i.e. it is uniquely determined by the observations. 

( 2 ) Before performing the experiment the probability that we shall obtain 
observed values such that (a, /3) will be contained in R, is exactly equal to P. 
P is usually chosen to be equal to .95 or .99 

We have shown that the expressions (21) and (25), i.e. 

VW — a) ■\/N {bg — |3) 

-j- a a] 

are normally distributed with zero mean and unit variance. Now we shall 
show that these two quantities are independently distributed For this purpose 
we have only to show that x, y, ai and (h are independently distributed (fli and oa 
are defined in ( 1 )), but since 

fli ~ P/(ai) = (ii — ea)/2 

Oa - E{at) = (iji - jja)/2 

X — = e 

y - E{y) = fj, 

we have only to show that i, fj, lx — h, fji — ■92 are independently distributed. 
We obviously have 

- _ *1 + «2 . 9i + 9! 

9--^. 

It is evident that ii, ea, 91 and 92 are independently distributed. Hence, 
F?[i(ei — ea)] = — Eil)/2 = 0 and also P[ 9 ( 9 i — 92 )] = {Eljl — Efjl)/2 = 0. 

Since «i — , 9 i — 92 , and i and 9 are normally distributed, the independence 

of this set of variables is proved, and therefore also (21) and (25) are inde¬ 
pendently distributed. It is obvious that the expression (20) is distributed 
independently of (21) and (25). From this it follows that 

N — 2 lV[oi(a — oi)^ iy — ax — /3)*] 

. . 2 ■ (N - 2 ) 8 * 

(26) 

_ {N — 2 )[oi(tt — aY + iy — ax ~ fi)^] 

2[(sif + aW ~ 2^4] 

has the P-distribution (analysis of variance distribution) with 2 and JV — 2 
degrees of freedom. The P-distribution is tabulated in Snedecor’s book: Calcu- 
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lation and Interpretation of Analysis of Variance, Collegiate Press, Ames, Iowa 
1934. The distribution of i log i? = z is tabulated in R. A. Fisher’s book! 
Statistical Methods for Research Workers, London, 1936. Denote by the 
critical value of F corresponding to the chosen probability level P. Then the 
confidence region R is the set of points (a, jS) which satisfy the inequality 


(27) 


N - 2 alia - af + (y - c^x - ^ „ 

2 ■ (s'y + aW “ 2as:, 


The boundary of the region is given by the equation 

(28) al(a - a)“ + (y — ax - d)^ = ~ 2Q:s(y]. 

This is the equation of an ellipse. Hence the region R is the interior of the 
ellipse defined by the equation (28). If Assumption V holds, the length of the ‘ 
axes of the ellipse are of the order l/\/Af, hence with increasing N the ellipse 
reduces to a point. 


7. The Grouping of the Observations. We have divided the observations in 
two equal groups Gi and Gi, Gi containing the first half (xi, yi), ■ , (xm , y^) 
and Gi the second half (ajm+i, Vm+i), • , , Vn) of the observations. All 

the formulas and statements of the-previous sections remain exactly valid for 
any arbitrary subdivision of the observations in two equal groups, provided 
that the subdivision is defined independently of the errors ei, • , ey ; 

yi, • ‘ , Vn • The question of which is the most advantageous grouping arises, 
i.e. for which grouping will a be the most efiScient estimate of a (will lead to 
the shortest confidence interval for a) It is easy to see that the greater ] oi [ 
the more eflicient is the estimate a of a. The expression | ai ] becomes a maxi¬ 
mum if we order the observations such that xi < Xi < • ■ < Xy . That is to 
say I £ii I becomes a maximum if we group the observations according do the 
following: 

Rule I. The point (*,, y,) belongs to the group Gi if the number of elements 

(i ^ of the senes xi, ,Xy for which x, < x, is less than m = N/2. The 
point (xt, Vi) belongs to Gi if the number of elements x, (j ^ i) for which x, < x, 
is greater than or equal to m. 

This grouping, however, depends on the observed values Xi, • • ■ ,Xy and is 
therefore in general not entirely independent of the errors ei, • • • , ey . Let us 
now consider the grouping according to the following: 

Rule II. The point {x ,, y,) belongs to the group Gi if the number of elements 
X,- of the series Xi, ■ ■ , Xy for which X, < X, {j i) is less than m. The 
point (x,, y,) belongs to Gi if the number of elements Xjfor which X, < X, (j i) 
is equal to or greater than m. 
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The grouping according to Rule II is entirely independent of the errors 
61 , • , Sat ; 7 ) 1 , • , vn It is identical with the grouping according to Rule I 

in the following case: Denote by x the median of a:i, • • , ; assume that e 

can take values only within the finite interval [-c, +c] and that all the values 
xi, ■ ■ ,Xtf fall outside the interval [x — c, x + c] It is easy to see that in 
this case Xi ^ 2 ; (i = 1, ■ • ■ , N) holds if and only if X, < X, where X denotes 
the median of Xi , ■ ■ • , Xu , Hence the grouping according to Rule II is 
identical to that according to Rule I and therefore the grouping according to 
Rule I is independent of the errors ^ . In such cases we get the best 

estimate of a by grouping the observations according to Rule I. Practically, 
we can use the grouping according to Rule I and regard it as independent of the 
errors 61 , ■ • • , ; tji , • - , if there exists a positive value c for which the 

probability that | e ] > c is negligibly small and the number of observations 
contained in [a: — c, x + c] is also very small. 

Denote by a! the value of a which we obtain by grouping the observations 
according to Rule I and by a" the Value of a if we group the observations 
according to Rule II The value a" is in general unknown, since the values 
Xi, ■ ■ fXif are unknown, except m the special case considered above, when 
we have a" = a'. We will now show that an upper and a lower limit for a" 
can always be given. First, we have to determine a positive value c such that 
the probability that | 61 > c is negligibly small. The value of c may often be 
determined before we make the observations having some a prion knowledge 
about the possible range of the errors If this is not the case, we can estimate 
the value of c from the data It is well known that if we have errors in both 
variables and fit a straight line by the method of least squares minimizing in 
the a:-direction, the sum of the squared deviations divided by the number of 
degrees of freedom will overestimate tr ]. Hence, if 6 is normally distributed, 
we can consider the interval [—3 jj, 3«] as the possible range of e, i.e. c = 3w, 
where denotes the sum of the squared residuals divided by the number of 
degrees of freedom. If the distribution of e is unknown, we shall have to take 
for c a somewhat larger value, for instance c = 5v. After having determined c, 
upper and lower limits for a" can be given as follows: we consider the system jS 
of all possible groupings satisfying the conditions: 

(1) If a:, < a: — c the point (a ;,, y,) belongs to the group Gi . 

(2) If a:, > a; + c the point (x,, 1 /,) belongs to the group Ga. 

We calculate the value of a according to each grouping of the system S and 
denote the minimum of these values by o*, and the maximum by o**. Since 
the grouping according to Rule II is contained in the system S, a* is a lower 
and a** an upper limit of a". 

Let be a grouping contained in S and denote by the confidence interval 
for a which we obtain from formula (23) using the grouping g. Denote further 
by I the smallest interval which contains the intervals for all elements g 
of S. Then I contains also the confidence interval corresponding to the grouping 
according to Rule II. If we denote by P the chosen probability level (say 
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P = .95), then we can say; If we were to draw a sample consisting of N pairs 
of observations (xi, yi), ■ , (x^, Vn), the probability is greater than or equal 
to P that we shall obtain a system of observations such that the interval I will 
include the true slope a. 

The computing work for the determination of I may be considerable if the 
number of observations within the interval [x — c, x + c] is not small. We 
can get a good approximation to I by less computation work as follows: First 
we calculate the slope a' using the grouping according to Rule I and determine 
the confidence interval [a' - 8, a' + A] according to formula (23). Denote by 

a{g) the value of the slope, i.e. the value of ^ corresponding to a grouping 

g of the system S, and by [a{g) - Sg , a(g) + A^] the corresponding confidence 
interval calculated from (23). .Neglecting the differences (5^ — 5) and (A„ — A), 
we obtain for I the interval [a* — 8, a** + A]. 

If the difference a** — a* is small, we can consider 7 = [a* — 8, a** + A] as 
the correct confidence interval of a corresponding to the chosen probability 
level P. If, however, a** — a* is large, the interval 7 is unnecessarily large. 
In such cases we may get a much shorter confidence interval by using some 
other grouping defined independently of the errors ei, • - , f/f ; vi, ■ • ■ , vy. 
For instance if we see that the values Xi) ■ • ■ , xy considered in the order as 
they have been observed, show a monotonically increasing (or decreasing) tend¬ 
ency, we shall define the group Gt as the first half, and the group Gi as the 
second half of the observations. Though we decide to make this grouping after 
having observed that the values Xi, ■■ ■ ,x^r show a clear trend, the grouping 
can be considered as independent of the errors 6i, ■ ■ , ejf . In fact, if the 
range of the error e is small in comparison to the true part X, the trend tendency 
of the value Xi, ■ ■ ■ ,xy will not be affected by the size of the errors ei , ■ • • , fy . 
We may use for the grouping also any other property of the data which is 
independent of the errors. 

The results of the preceding considerations can be summarized as follows; 
We use first the grouping according to Rule I, calculate the slope o' = - - — 

Xi — Xi 

and the corresponding confidence interval [o' — 5, o' + A] (formula (23)). This 
confidence interval cannot be considered as exact since the grouping according 
to Rule I is not completely independent of the errors. In order to take account 
of this fact, we calculate a* and a**. If a** — a* is small, we consider 7 = 
[a* — 8, a** -|- A] with practical approximation as the correct confidence interval. 
If, however, a** — a* is large, the interval 7 is unnecessarily large. We can 
only say that 7 is a confidence interval corresponding to a probability level 
greater than or equal to the chosen one. In such oases we should try to use 
some other grouping defined independently of the errors, which eventually will 
lead to a considerably shorter confidence interval. 

Analogous considerations hold regarding the joint confidence region for ot 
and 13. We use the grouping according to Rule I and calculate from (27) the 
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corresponding confidence region R. If j a'‘* — a* | and ] b** — b* [ are am nil 
(b* = y — and b*’*' = y — a*'*x) we enlarge iJ to a region R corresponding 
to the fact that a and b may take any values within the intervals [a**, a*] and 
[b**, b*] respectively. The region R can be considered with practical approxi¬ 
mation as the correct confidence region. If | a** — a* ] or | b** — b* | is large, 
we may try some other grouping defined independently of the errors, which 
may lead to a smallei’ confidence region. In any case R represents a confidence 
region corresponding to a probability level greater than or equal to the 
chosen one. 

8. Some Remarks on the Consistency of the Estimates of a, /3, o-,, o-,. We 

have shown in section 3 that the given estimates of a, y, <fe and (r, are consistent 
if condition Y is satisfied 

If the values xi, ■ ■ , Xat are not obtained by random sampling, it will in 
general be possible to define a grouping which is independent of the errors and 
for which condition V is satisfied. We can sometimes arrange the experiments 
such that no values of the series xi, - , Xj, should be within the interval 
[x — c, X + c] where x denotes the median of Xi, • ■ ,xk and c the range of 
the error «. In such cases, as we saw, the grouping according to Rule I is 
independent of the errors. Condition V is certainly satisfied if we group the 
data according to Rule I. 

Let us now consider the case that Xi, . • , Xy are random variables inde¬ 
pendently distributed, each having the same distribution. Denote by J a 
random variable having the same probability distribution as possessed by each 
of the random variables Xi, • , Xy. Assuming that X has a finite second 

moment, the expression in condition V will approach zero stochastically with 
Y 00 for any grouping defined independently of the values Xi, • • ■ , Xy . 
It is possible, however, to define a grouping independent of the errors (but not 
independent of Xi, • , Xy) for which the expression in V does not approach 
zero, provided that X has the following property: There exists a real value X 
such that the probability that X will lie within the interval [X — c, X + c] 
(c denotes the range of the error e) is zero, the probability that X > X + c 
is positive, and the probability that X < X — c is positive The grouping can 
be defined, for instance, as follows: 

The i-th observation (x,, y,) belongs to the group Gi if x, < X and to Gt if 
X, > ,X. We continue the grouping according to this rule up to a value i for 
which one of the groups Gi , Gs contains already NJ2 elements. All further ob¬ 
servations belong to the other group. 

It is easy to see that the probability is equal to 1 that the relation Xi < X 
is equivalent to the relation X, < X — c and the relation x, > X is equivalent to 
the relation X, > X -f- c. Hence this grouping is independent of the errors. 
Since for this grouping condition V is satisfied, our statement is proved. 

If X has not the property described above, it may happen that for every 
grouping defined independently of the errors, the expression in condition V con* 
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verges always to zero stochastically. Such a case arises for instance if X, e and 
ri are normally distributed.^'* It can be shoyvn that in this case no consistent 
estimates of the parameters a and ^ can be given, unless we have some addi¬ 
tional information not contained in the data (for instance we know a prion the 
ratio ff./ir,)- 

9. Structural Relationship and Prediction.*^ The problem discussed in this 
paper was the question as to how to estimate the relationship between the true 
parts X and F. We shall call the relationship between the true parts the struc¬ 
tural relationship. The problem of finding the structural relationship must not 
be confused with the problem of prediction of one variable by means of the 
other. The problem of prediction can be formulated as follows: We have ob¬ 
served N pairs of values (xi, yi), ■ • • , {xh , Vn)- A new observation on x is 
given and we have to estimate the corresponding value of y by means of our 
previous observations {xi ,yi), • ■ • , {x,f, yn). One might think that if we have 
estimated the structural relationship between X and F, we may estimate y by 
the same relationship. That is to say, if the estimated structural relationship 
is given by F = oZ + 6, we may estimate y from x by the same formula; 
y = ax -\- h. This procedure may lead, however, to a biased estimate of y. 
This is, for instance, the case if Z, « and ri are normally distributed. It can 
easily be shown in this case that for any given x the conditional expectation of 
2 / is a linear function of x, that the slope of this function is different from the 
slope of the structural relationship, and that among all unbiased estimates of 
y which are linear functions of x, the estimate obtained by the method of least 
squares has the smallest'variance. Hence in this case we have to use the least 
square estimate for purposes of prediction. Even if we would know exactly the 
structural relationship F = aZ /S, we would get a biased estimate of y by 
putting y = ax -b (3. 

Let us consider now the following example: Z is a random variable having 
a rectangular distribution with the range [0, 1]. The random variable « has a 
rectangular distribution with the range [—0.1, -f- 0.1]. For any given x let us 
denote the conditional expectation of y by Z(y | x) and the conditional expecta¬ 
tion of Z by F(Z I x). Then we obviously have 

Eiy I x) = aE{X I x) + ^. 

Now let us calculate E{X [ x). It is obvious that the joint distribution of Z and 
«is given by the density function: 


5dZd«, 


I wish to thank Professor Hotelling for drawing my attention to this case. 

I should like to express my thanks to Professor Hotelling for many interesting sug¬ 
gestions and remarks on this subject. 
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where X can, take any value within the interval [0, 1] and e can take any value 
within [—0.1, + 0.1]. From this we obtain easily that the joint distribution of 
X and X is given by the density function 


5 dx dX, 


where x can take any value within the interval [—0.1,1.1] and X can take any 
value lying in both intervals [0, 1] and [a: — 0.1, a: + 0.1] simultaneously. De¬ 
note by the common part of these two intervala. Then for any fixed x the 
relative distribution of X is given by the probability density 

dX 
f dX 


Hence, we have 


E(X\x) = 



We have to consider 3 cases: 

(1) 0.1 < a: < 0.9. 

In this case 7* = [x — 0.1, x -{- 0.1] and 


E(X\x) = 


f *+0 1 

/ XdX 

Jx ~0 1 


»*+0 1 
in-O.l 


= X. 


dX 


(2) -0 1 < X < 0.1. Then L = [0, x -|- 0.1] and 

fsH-O 1 


E{X]x) - 


/ 


XdX 


I 




= .5x + .05. 


dX 


(3) 0.9 < X < 11. Then 7, = [x - 0.1,1] and 

r XdX 

EiX\x) = - = .6x + .45. 


/' 


dX 
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Since 

E{y I x) = aM{X I a:) + 

we see that the structural relationship gives an unbiased prediction of y Iroin x 
if 0.1 < a: < 0.9, but not in the other cases. 

The problem of cases for which the structural relationship is appropriate also 
for purposes of prediction, needs further investigation. I should like to mention 
a class of cases where the structural relationship has to be used also for prediction. 
Assume that we have observed N values (aii, i/i), • • ■, (x }/, of the variables 
X and y for which the conditions I-IV of section 2 hold. Then we make a new 
observation on x obtaining the value x'. We assume that the last observation 
on X has been made under changed conditions such that we are sure that x' does 
not contain error, i e. x' is equal to the true part X', Such a situation may arise 
for instance if the error t is due to errors of measurement and the last observa¬ 
tion has been made with an instrument of great precision for which the error of 
measurement can be neglected. In such cases the prediction of the correspond¬ 
ing y' has to be made by means of the ^timated structural relationship, i.e, we 
have to put y' = ax' -f- b 

The knowledge of the structural relationship is essential for constructing any 
theory in the empirical sciences. The laws of the empirical sciences mostly 
express relationships among a limited number of variables which would prevail 
exactly if the disturbing influence of a great number of other variables could 
be eliminated. In our experiments we never succeed in eliminating completely 
these disturbances. Hence in deducing laws from observations, we have the 
task of estimating structural relationships. 

Columbia Univbbsity, 

New York, N. Y. 



A METHOD FOR MINIMIZING THE SUM OP ABSOLUTE VALUES 

OF DEVIATIONS 

By Robert R. Singleton 

1. Introduction. In the Philosophical Magazine, 7th series, May 1930, E. C. 
Rhodes described a method of computation for the estimation of parameters 
by minimizing the sum of absolute values of deviations. His is an iterative 
and recursive method, in the following sense There is a direct method for 
mmimization with one parameter. Assummg a method for minimization with 
n — 1 parameters, Rhodes nnposes a relation between the n parameters (in an 
n-parameter problem) and finds a restricted minimum by the method for n — 1 
parameters. In this sense his method is recursive. He then repeats the process, 
by imposing on the n parameters a new relation determined by the restricted 
minimum. In this sense his method is iterative The process is finite, ending 
when a restricted minimum immediately succeeds itself, indicating a tiue 
minimum. 

Rhodes' paper presents the method without proof. The purpose of the 
present paper is to analyze the situation m detail sufficient to mdicate proofs 
for various methods, and to present a new method which reduces the labor of 
solution by eliminating the recursive feature The iterative approach is re¬ 
tained The solution of Rhodes’ illustrative problem will be given for com¬ 
parison between the two methods 

The paper uses geometric terminology and develops to quite an extent the 
geometry of a surface representing the summed absolute deviations. This 
seems the clearest means of presenting the relationships Further analysis of 
the properties of this surface should lead to an even more direct method for 
attaining the minimum than the one here presented. 

In the writing of the paper, no attention has been given to sets of observa¬ 
tions or equations among which a linear dependence may exist. In practice, 
such a situation almost never occurs. If the need arises, the adjustments 
which must be made to take care of dependence are in each case fairly obvious 

2. Geometric Analogue of Summed Absolute Deviations. Let n observa¬ 
tions on V -j- 1 variates be represented by x'a , y' where i = 1, , n; a = 

1, , V. Unless otherwise noted, latin indices have range 1 to n, greek indices, 

1 to r The summation convention of tensor analysis is used. 

' The variates are to be statistically related by the linear function^ 

__ f 

* This includes the linear function with a constant, since a variate x' “ 1 may be used. 
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y' being an estimate of y\ u‘‘ are to be determined so that v — — y' \ 

is a minimum. Set 

(1) = »!•!*“ — y' 

and determine functions c'(w“) so that ev' > 0, and | e' | = 1. It is immaterial 
that e is not uniquely determined when m“ satisfies v' = 0. Then v = 
is to be minimized. Using (1), 

(2) V = Xau‘‘ — y 

where 

Xa = V = S.e'?/’. 

Consider a Euclidean {v + l)-space, , with coordinates , u, v. 

The coordinate hyperplane perpendicular to the w-axis will be called E, . In 
E^+i each of equations (1) for a particular t represents a v-plane which intersects 
Ef in a (iJ — l)-plane when v' = 0. Each of the equations 

^3) v' = - y') 

represents two half-planes which touch E„ and each other along the {v — 1)- 
plane given in E, by the equation 

(4) Xau“ — y' = 0, 

The functions on the right-hand side of (3) are thus continuous everywhere, 
and linear in any neighborhood of E, none of whose points satisfies (4). Since 
a sum of functions continuous and linear in a neighborhood is also continuous 
and linear in that neighborhood, it follows that the function on the right in (2) 
is continuous for all u, and linear for every neighborhood of E, containing no 
points which satisfy (4) for any i. Hence 
Observation I: The surface (S) given in E,+i by (2) consists of portions of 
v-planes joined together. The projection of these joins on E, forms a network of 
(k — \)-planes determined in Ey by equations (4). 

3. Existence of a Minimum. Define a “bend of degree r on S" to be the 
locus of all points on S whose w-coordinates satisfy a set of r independent 
equations of (4). To each set of r independent equations corresponds a unique 
bend of degree r. 

If a linear relation u“ = a“}f b^, a = 1, ■ • ■ , p < v, rank (a“) = p, is 
imposed on u“, all the preceding development, reduced in dimension, applies 
to the new variates x'aO^ , y' — 

Observation II: A section of S by a plane of any dimension d <. v has all 
the properties of an 8-surface of dimension d. 

Since any set of consistent equations selected from (4) detemunes such a 
linear relation for the application of Observation I to any of the bends of S 
shows that each r-bend consists of linear elements of dimension v — r, joined 
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at points which lie on linear elements of lesser dimension. Thus S is a poly¬ 
hedron Its faces we term complexes of dimension v, C,, and the linear ele¬ 
ments of its edges which he wholly in bends of degree r, but not of degree r -t- 1 
are complexes C,_r of dimension v — r. The boundary of any Ca , a > 0, 
consists of complexes of lesser dimension. The term complex is not restricted 
to either open oi closed complexes. 

Since the function v{u‘‘) of (2) is non-negative, it possesses a greatest lower 
bound (g.l.b.) g Since for some number h > g, there exists an N such that 
for all 1 u“ 1 > N, jj(u“) > h, it follows that for some closed neighborhood of E, 
the g.l.b. of V is g. Since v is continuous everywhere it attains its g.l b., and 
so S has minimum points. Since the minimum of any complex not parallel 
to Ey,, lies on its boundary, and the boundary consists of complexes, it follows 
that the minimum points of S consist of Go’s and/oi' entire complexes of dimen¬ 
sion > 0 which are parallel to E„. The next section will show that S has a 
unique minimum complex (including of course its boundary complexes) and 
furthermore is cup-shaped 


V 



4. Convexity Property; Uniqueness of the Minimum. Consider >< = 1 in 
the preceding treatment (and for convenience not written) S looks generally 
like Fig. 1. The slope changes only where an equation of (4) has a root. Sup¬ 
pose the point is Wo, and = 0. From (3), since > 0, it follows 

that < 0 for u < iio, > 0 for u > ito. Since in (2) x = S.e’x', and 

since for h sufficiently small and Uo — h<u<TMi-\rh the only e to change 
value’' is e^, we have that 

x(ui) -h 2 I eV I = x(ms) 

where 

ltd — h<Ui<uo<U2<Uo + h. 

Hence the slope is a monotonic increasing step function. Since for u suffi¬ 
ciently small all e'x' •< 0, and for u sufficiently large all ex > 0, at some inter¬ 
mediate point or points either the slope is zero or it changes from negative to 

* The e’s corresponding to equations proportional to equation (1) also change value at Xn. 
This does not destroy the argument 
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positive without becoming zero. In the first case a single closed Ci is the 
minimum complex; in the second, a Co. In either case the curve given by (2) 
when y = 1 is concave upward and has just one minimum complex, except for 
complexes of lesser dimension constituting the boundary of this complex. An 
obvious consequence is 

Lemma I. The set of •points u for •which v is less than some number N form a 
convex point set. 

This result is easily extended to the general dimension v. If for any two 
points ui , Ui of Ep, v{ui) < N and viuf) < N, the plane in E,+i given by m“ = 
ui -b X(m“ — Ui) makes a one-dimensional section of S. By Observation II, 
the points u lying on the projection of this section on E, have the property of 
Lemma I and of course lie on the straight line joining Ui and U 2 This is the 
property required for a convex point set. Hence 

Theorem I The set of points w“ of Eyfor which v(w“) as given by (2) is less 
than a fixed quantity form a convex point set 

From this it follows immediately that there is a unique minimum complex 
It is appropriate here to point out that no two complexes can be contained in a 
single plane of the same dimension This follows from the equation giving 
monotonicity of slope in one dimension, and Ob.servation II. 

6. Gradient Directions. From here on the treatment will be of u as a function 
defined on E, , and the equations will represent objects in Ep , unless otherwise 
stated. Complex and Bend also will refer to the projections on Ep of the com¬ 
plexes and bends of S For a single-valued function defined on E, the gradient 
at a point is the projection of a normal to the surface representing the function 
in E,+i . If the function is defined only over a .subspacc of E, possessing denva- 
tives, the gradient will be required also to be tangent to the subspacc. This is 
sufficient to determine a unique direction, and preserves the property that for an 
infinitesimal displacement in any direction the value of the function decreases 
most rapidly in the direction of the gradient. Here gradient is taken negative 
to its usual sense 

A point u lying on a Cr but not on a Cr-i will have a gradient in Cr and also 
in each higher-dimensional complex on whose boundary Ct lies. If the gradient 
for M as a point of Cr+k points into Cr+k (remembering that u lies on the boundary) 
this will be called a usable gradient In the case of the greatest k for which 
there exists a usable gradient, there exists but one Cr+i providing such a gradient, 
and that gradient is the “best” gradient; that is, of all directions in E, it pro¬ 
vides the direction of most rapid decrease of the function v. This follows from 
Theorem I Furthermore, all complexes of lesser dimension providing usable 
gradients lie on the boundary of this (7^+*,. In fact 

Theorem II. If for a point u on Cr, two complexes C, and C',, s > r, lying 
in different bends of degree v — s but incident at Cr , both provide usable gradients 
for u, then the complex C^+i on whose boundary lie both Cs and. Ci also provides a 
usable gradient for u. 
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This follows from Theorem I Select Ui on the gradient in C,, on the 
gradient in C, , for which v{ui) = v(Ui) The join of Ui and lies m Cs+i, 
and for some point, Mg on this join, v{u^ is less than i;(wi) = v{u^. Also, the 
distance uuz is less than at least one of mtl, uv^ . Hence Cs+i must contain a 
usable gradient. 


6. Selection of Best Gradient at Bends. The direction of the gradient for a 
point Wo considered as lying on a C, is given by 

(5) Q * ~ 5/ifi . 


If Wo lies in the intenor of a face, this is unique If Wo lies in a hend, so that 
some e are not determined, the g“ for each face is found by selecting the indeter¬ 
minate e’s as 4-1 or —1, according to the face being considered 
For a point Uo considered as lying on a bend of degree r, given by r inde¬ 
pendent equations of (4): 

(6) - 2 /' = 0, (X = 1, .. , r), 

the gradient for a particular Cv~r, determined by the conditions at the begin¬ 
ning of section 5, is 

(7) g“ = - aja 


where Jc\ satisfies 

= SaX^aXc, (m = 1, • • ■ , r) 


and aja is as given in (2), the choice of sign for the indeterminate 
(X = 1, being immaterial. They may, in fact, be taken as 0 in this 

instance. 

For a point uo lying on an i-bend given by (6), to determine which complex 
contains the best gradient, each (r — l)-bend incident on the r-bend at Wo is 
tested for a usable gradient. Theorem II then determines the complex con¬ 
taining the best gradient. 

There are 2r such complexes incident at wo, given by the r sets of equations 
selected from (6): 


( 8 ) 


(X): Xau“ — y'^ = 0 


(n = 1, • > , X — 1, X + 1, 

(X = 1, • , r). 


.r) 


The two complexes lying in the same (r — l)-bend have the same equations in 
(8), but are distinguished later by e^(wo) for the omitted equation being taken 
first -fl, then —1. 

The gradient for the Xth pair of complexes is 


— Xahir Xtx 

similar to (7), but not identical. For ^ = 4-1 in determining Xa , we have 
g\+ , and for e’' = — 1, . We restrict the consideration to = 4-1. 
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The line lu the direction of greatest slope is then 

(K oc I ^ a 1 

u = "Uo + 

Now Mo is here considered lying on the complex given by (8X) with = +1, 
In order that gx+ point into this face, the deviation for the Xth observation 
must exceed 0 when i > 0; otherwise, for a displacement in the direction of g\+, 

^ changes sign immediately and the course is m the other complex. This 
deviation is 

i’’' = x\u“ — i/' = x\uS — j/' + x),g“+i = x\g\+L 

Had px- been used, this deviation must be less than 0. Hence a necessary and 
sufficient condition that a complex given by (8) with either choice of possess 
a usable gradient is 

(9) $x = > 0 

Tor r = 1 the condition is given by (9) with the first sum merely omitted. 
$x+ and $x- cannot both exceed 0. 

When all sets of equations (8X) are tested by (9) the equations common to 
all sets possessing a usable gradient determine the complex with the best 
gradient, retaining the values of e for which (9) was satisfied. 

7. Property of the Minimum Point. For a minimum point, given by (6) 
with r = V, all $x must be negative. Define = Z„XaXl and = H^x^aXa 
for convenience. Then in (9), the numbers k,, —I are seen from their defini¬ 
tion in (7) to be proportional to the cofactors of the Xth row of the matrix 
X'“'), 11 having the same range as X. Thus i’x+ = c Det X!f), and 
$x- = — c Det (Z"'', X-), where m the first case X'‘° is determined with = -f-1, 
in the second with = —1. The factor of proportionality, c, must be the 
same since Z'" is unaffected by change of i'. Now let Z'' = l^aXaXa where 
x*a = hke’x^ , the range of k oraittmg the range of X. Then 

$x+ = c [Det (Z"', ZO -b Det (Z"", Z"')] 

and 

4-x- = -c [Det (Z"", ZO - Det (Z"', Z'^')]. 

Hence 

$x+$x- = -c' ([Det (Z"', Z")]* - [Det (Z"', Z"')]'). 

Now let A represent the square matrix (a:“), a giving the rows and X the columns. 
Let Bx represent the matrix formed from A by replacing the Xth column by . 
Then 


$x+$x_ = -c' [Det" (4%) - Det" {A'A)] 
= -c" Det" A (Det* Bx - Det* 
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and this will have the same sign as 

= I Det (i) 1 - I Det (5^) |. 

Since 'I>x+ and $x_ are never both positive, and at the minimum are both nega¬ 
tive for all X, at the minimum all > 0. To determine all together, let, 
in matrix notation, s' = (si, ■ ■ , s,) and x*' = (x* , ■ ■ ■ j x*) where x! were 
defined previously. Determine z as the solution of As = x*. Then [ Det (B\) ] 
are equal to | Zx|l Det {A)\. Hence a necessary and sufficient condition that 
'I'x > 0 for all X is that all j Zx j be less than one. Hence 
Theorem III; If a zero-complex is given by a set of equations whose matrix is M, 
a necessary and sufficient condition that the complex be a unique minimum is that 
the solutions of M's = x* he all less than one in absolute value If k of the solu¬ 
tions are equal to one in absolute value, and the rest are less than one, the minimum 
is a complex of dimension k with the zero-complex as one of its corners. 

The last statement follows since if one solution is 1 in absolute value, a 
corresponding $x = 0, and hence no gradient, usable or not, exists. Thus the 
corresponding complex is parallel to E,. 

8. Minimization for One Dimension. A method for minimization of (2) when 
there is just one parameter evolves from the monotonicity of slope in that case 
Suppose the variates are w' and s', and (1) is 

(10) v' = wH - z\ 

Suppose the variates are arranged in order of s'jw', starting with the smallest. 
The slope of the rth segment (Fig. 1) from the left is 

21 1 - 2 1 w’ 1. 

»—1 »— r+1 

The TTnimmum occurs when the slope is 0 or changes from negative to positive; 
that is, when the first sum equals or exceeds the second; or when the first sum 
equals or exceeds half the total. This is a standard computation. If the 
change takes place when r = k, then i = is the value of i giving the 
mmimum. 

9 Mimimization Procedure for v -1- 1 Dimensions. For any continuous func¬ 
tion with unique minimum and having the property of Theorem I, the following 
holds. Let Mo be any point of E ,. Let u,+i = u, -j- X,fi, where X, is any 
direction chosen at random and U is the value of t for which the function attains 
a mmimum on the curve u = m, -|- X,<. Then the probability is one that 
lim u, = ui, where Ui is a minimum point for the function. If X, is taken 

{-*6C 

always as the gradient of m, , such a procedure is called the “method of steepest 
descent” for approaching the minimum point. 

Usually the limit is never attained. In this case, however, the minimum is 
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attained. The minimum can be approached as closely as desired, hence a 
complex incident on the minimum is reached. But the convex point sets of 
Theorem I surrounding the minimum complex are all similar convex poly¬ 
hedrons in Ey, whose corresponding faces are parallel, and the gradients at 
points on a bend cannot point into a higher dimensional complex on the bend. 
Hence the sequence of points lie on bends of successively greatei degree, and 
must eventually attain the minimum complex. 

TABLE II 
Points Uk 


^1+1 ‘‘ul + glh _ 

Ho ~ (38, 5, —2) 

Hi = (37.98202, -4 74828, -1.48457) 
Hj = (37.45908, -2.07142, -1.85631) 
Ho = (32.83333, -2 07142, -1.76191) 


TABLE III 

Computation of tk = Zk/wk 


s la>*| 

in order 
of ool. 

exceeds 

ati = 

hence ik =• 

2 1 H)o 1 


17521 

16 

.00599334 

si Hill 

(15) 

2502 

2 

.0397792 

s 1 Hlo 1 


4610 


.00496545 


TABLE IV 


Gradients g“ for column (5fc -f- 8) 


k 

si 

si 

3 

9k 

0 

-3 

42 

86 

1 

-13146 

67293 

-9345 

2 

-931588 

0 



The computational procedure is as follows: 

1 Select a point ho . 

2. Determine the gradient ga from (5). 

3 Compute = x),gS , zl = y' — Xau“ . 

4. Determine to by the method of section 8. 

5 Compute nf = Ho -h • 

6 Determine the complex containing the best gradient by (9), and the 
gradient gt by (7). 

and so proceed to the minimum This may be finally tested by Theorem III. 
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Step 5 is Ulilleeessaty, since the only use for ui is to determine But 
c'(ui) = e\k), the latter referring to the computation in step 4, Also, after 
the first step, it is easier to compute 2 ‘ by 

~ ~ Wjitk < 

10. Example. The computation for (9) is not so great as it would seem, since 
some of the work is duplication and some must be computed anyway for the 
gradient. Even so, for r > 3 it becomes, perhaps, more arduous than its 
contribution would seem to justify. For v > 4 it is recommended that the 
test of (9) be omitted for points on bends of third degree or greater, and the 
final test of Theorem III be applied at the end of the work. If this test shows 
the minimum has not been reached, the complex in which lies the best gradient 
will be indicated at the same time. 

The mmimum number of steps is 0. The maximum number is tremendous 
but finite. The expected number is probably a little greater than v, 

In Tables I to IV, the method is applied to the problem used by Rhodes to 
illustrate his method The independent variates are shown in columns (2), (3), 
(4), Table I, the dependent variate in column (5). The only other original 
datum is the initial point, selected by guess, shown in line 1, Table II Since 
slightly different formulas were used in the computation, the signs of cols 
(6), (8), (11), (16), (18) arc reversed, and the gradients in Table IV are 
multiplied by constants. As they are used only for directions, this does not 
matter. 

Princeton University, 

Princeton, N. J, 



A STUDY OF A UNIVERSE OF n FINITE POPULATIONS WITH 
APPLICATION TO MOMENT-FUNCTION ADJUSTMENTS 
FOR GROUPED DATA 

By Joseph A. Pierce 

The object of this paper is to study the case of a universe of n finite popula¬ 
tions, considering both the expectations of population moment-functions and 
the moments of sample moments, and to make applications of the results which 
may be of interest to mathematical statisticians. The sampling formulas which 
are derived reduce to the usual infinite or finite sampling formulas, under 
appropriate assumptions. Also a method is given whereby finite sampling 
formulas may be transformed into the corresponding infinite sampling formulas. 

The general methods and formulas which are given in Part I for the expecta¬ 
tions of population moment-functions are used, in Part II, to find the expecta¬ 
tions of moments of a distribution of discrete data grouped in “k groupings 
of k'\ 


1. A Study of a Universe op n Finite Populations 

Let «(7jv be a universe composed of the set of populations rA, (r = 1,2, , n) 

each population rX consisting of a finite number of discrete variates ,x ^, 
{i = 1, 2, • - , N), (N > n). The fth moment of rX is denoted by ria . The 
ith. central moment of tX is denoted by rfii • The fth moment and the fth central 
moment of nUn are respectively denoted by lii and fit . The expected value of a 
variable y is denoted hyE{y). We have 

1 ^ 1 ^ ( 

rUt = tX\ > rfit = E[rXi — r^l) = Tr S (r^i ~ j 

iV i-i rv 1=1 

1 " 1 " 
fl ll AO/"! “ Eirfil) — ~ £ rA*i I ML'MI ^ X{rfil) — ^ rW , 

n r=l ’■='1 

_ jpf »1 . *2 , ■vl 

We also note that written ym .(i?j ■ 

1. The eoqiected value of moments and central moments. It follows easily 
from (1.1) that 

(1.2) W.)i( = M( • 
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From the usual formula for central moinente in terms of moments, we get 

(1.3) Hlfi = 2 (■”!) 

Terms of the form nn-.Hiit-i be evaluated by use of the well known formulas 
[20; p, 58] for changing from moments to central moments in the case of a multh 
variate distribution. Two of these formulas arc given below. 

fiUVMafbUc ~ Wll ~ AUOi/laMfcMoMWUPa/JiPo 

(1.4) 

4 " 2 Hm.llallbllif^li.ltallbliel^OaV.liaUbltc • 

We find that 

(1-5) /'ll:)!!;!,-! = ^ pilpjlnlrj! > 

where pipi is a two-part partition of i and n -h rj = 1. 

Usmg (1.3) and (1.5), we get 

(1.6) piiM = Aj — Msjn . 

(1.7) ftiij, = As - 3 Au.mi„ + OmiAsci + 2A8mi . 

(1.8) mi';4 = fit + 6(A2 - 2pl) pvH ~ 12miA3:mi + 12MiAu-(i,;i2 

— 4Au:,iih, + OAsiiMiw ~ . 

etc. 

If the n populations are identical, it is evident from the definition of jaej;, 
that, for all finite t, 


2. The expected value of Thiele seminvariants. If the tth Thiele sem- 
invariant is denoted by X(, then 

nn^ (-l)^*(l(p-l)I 

u.y; m, i silsjl... sj(2!)*»(31)'' ... 

the summation being taken for all positive integers s,(i = 1, 2, . • ■ «), for which 

V V 

p “ j t ~ • 

<-l t-1 

Terms of the form ■ mv evaluated by (1.4). We have 


(1.10) /II.X 2 = ^2 — 
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(1.11) = As — + 6X1/12^1 + 2;U3 ,^i ■ 

(1 12 ) MiXi = X 4 + 12[X2 ~ 2Xi] ^ 2 111 24Xi/i3 ,n + 24Xi/iiimuj 

— 4 ^ 11 : 11,113 + 12m21 — 3/l2.,ij — 6|U4.,h . 

etc. 

If the n populations are identical then, for all finite t, 

Ml \t = Xj. 


3. Generalized sampling. It follows from definition that all rational isobaiic 
moment-functions have the property that they may be expressed in terms of 
power sums and power product sums with certain coefficients. Of the power 
sums and power product sums which enter a sampling formula only the power 
product sums take different forms depending on the law of variate selection 
Now, there are two possible courses which may be followed by one who wishes to 
derive sampling formulas for the case of a smgle population 

1. One may decide m advance on the law which he wishes to govern the 
selection of variates which enter the sample. Then he may apply this law m 
the evaluation, in terms of moments, of every power product term as it occurs 
in each formula which is derived. 

2. One may derive the formulas for sampling under the condition that the 
law is unspecified, thereby obtaining formulas which are capable of being 
interpreted in terms of laws that are decided upon later 

We illustrate the two possible courses by considering the formula. 


(1.13) 


r ^_2 I 2r(r — 1) . 

Ms a — “ "i" 7 2/5, X,, 

s s(s — 1) 


which Carver [12; p. 102] obtains for the case of ffinte sampling without replace¬ 
ments Here r = the number in the sample, s = the number in the parent 
population and 2 , = the algegraic sum of the variates of fth sample. Later, 
by evaluating and S5,5, in terms of moments, he finds 


(1.14) 


ih'.i 


r(s — r) 
s — 1 


M2i- 


(It should be noted that Carver [12; p. 115] obtained the corresponding formula 
for infinite sampling by letting s —>■ <xi). 

The preceding development is eiitirely m accord with the first of the courses 
stated above It is also the standard procedure and is the course followed by 
such writers as Isserles [2], Neyman [6], Church [7], Pepper [11] and Dwyer [20], 
in deriving finite sampling formulas. Also, it is the course followed by such 
authors as “Student” [1], Tchouproff [3], Church [5], Craig [9], Pisher [10], and 
Georgesque [13] for the case of sampling from an infinite population 
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However, m (1.13), it is possible to employ the definition, 


'Zx^x, = Ai.i. 


s(s - 1) 

Then (1.14) becomes 

(1.15) /is 2 = »/i 2 + r{r — l)/ii,i. 

Formula (1.15) may be interpreted aa holding for either finite or infinite 
sampling, depending on the interpretation which is given to fii,i. It may be 

easily shown that, If the sampling is from a limited supply, /ii,i = - ^2 and 

s — 1 

(1 15) reduces to (1.14). If the sampling is from an infimte supply, becomes 
/ii and therefore 


M2.2 




which is the formula [12; p. 115] that coricsponds, in the infinite case, to (1.14), 

Thus, either of the two courses is possible in the case of sampling from a single 
population. However, if one wishes to get general formulas which hold for both 
infinite and finite sampling, he should follow the second course. Similarly, m 
order to obtain generalized sampling formulas where the lelations between the 
variates arc unspecified and the populations are assumed to bo different, the 
second course should be followed. 

It appeals that Tchouproff [3], [4] was tlic first to approach the sampling 
problem from such a general point of view. However, his methods of derivation 
are quite complicated and his results, in general, are difficult to apply to a given 
problem [5], [8], 

Samples of n are formed from „Ux by chosing one variate from each of the n 
populations. A typical sample is 


iJ:,, , jXij , 3X1J , • ■ • , r®>r ) ' ' ' ! n^'n ' 

We define [4; p. 472] 


, , £ ri^V 


(1.16) 




~ rirn' ToMtUs I 


— T' _ 1 Cf _ 

rtiMtiJj. Ov nrj r„/l(i£2.. 


where k represents the number of possible terms of the given form; & means v 
times the sum for unequal values of ri, r 3 • . rv and n'”’ = n{n - I)--- 
(n — V + 1). 


4. Moments and product moments of sample moments. The ith moment of 
the jth sample is denoted by jWij. The sth moment of ,mt for all j is denoted by 
'Hi m, where the prime indicates that the moments of the umverse are measured 
about a fixed point. It follows that 
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(1.17) 


uTTli 


1 ” 

~ “ r^lr mi — .E[;Wil] . 


n r=i 


Also, the general product moment, in which the variates of both the sample 
and the universe are measured about a fixed point, is defined by 


( 1 . 18 ) 


n>i^ ,j'TOfj . .. 


As an illustration of the methods used to derive the formulas of this section, 
consider a special case of (1.18) when si = 2 and s, = 0, (i = 2, 3, ■ ■ • , v). Then 


~ ^ I rMsst “h Si riTifl’t.t I . 
n Lr=i _J 


Therefore, by (1.1), (1.2) and (1.16), we get 


(1.19) 


'l^2.mi — + n®/*(,(]. 


Using the formulas [20; p. 34] relating products of power sums and power 
products to expand expressions of the type > ,m[^), we give, in the 

tables below, formulas for moments and product moments of sample moments 
through weight six. The number in a cell and the eoefficient, in the same 
column, at the top of the table should be taken as the coefficient of the moment 
which is found in the same vertical division. The coefficients in the vertical 
division are coefficients of the entire right members of the formulas for the 
respective moments. 

Terms of the form if ti = {2 = - ■ = U = i, are sometimes written 

The numbers in the cells of the tables are identical with the numbers in the 
cells of the tables given by Dwyer [19; p. 30] for the expected value of partition 
products. 


6, Moments of central moments of samples of n. The ith central moment of 
the jth sample is denoted by ,mi Then, 

1 

“ 'y ] (r^»r 3WI1) 

n r=l 


( 1 . 20 ) 

and 

( 1 . 21 ) 


'ni ni ~ 2 1 ■ 

L" r""l J 
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TABLE I 


( 1 ) 


C2) 


(3) 
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After writing (ra;,^ — ,mi)‘ as the sum of the general term of a bmomial series 
and then expanding the resulting right member of ( 1 . 21 ) as a product of power 
sums [20; p. 19], we get 


( 1 . 22 ) 








V 11 

where £ r, = s, 2 ^ 3 ^•, = p and tti, n, ■ ■ are the numbers of the repeated 

j=i j —1 

parts of s. 

The mean of the ith central moment takes the following simple form, 


(1.23) 


ViiS, 





where the moments in the right member of (1.23) through weight six are given 
in the tables of section four. Also, 


(1.24) 

(1.25) 

(1.26) 


Vaimj ~ Vs mj 2V21 mimj "I" M4.n>i ■ 

Vs ”■2 ™ Vs r«2 ~ 3 /JJJ mi»ij ■[■ 3 “ M8 mi • 

Vsiina ~ Vsii", + 9Va2 m,mj V 4Ve.mi “ SVlirmiinjma 
“b 4 12 Mn.mima ■ 


After substituting from the tables of section four, (1.23) through (1.26) become 


(1.27) 

lilimi 

TT 

W.il. 













(1.28) 

Vl:ffl3 


3/12,1 + 2j»ij]. 






/Il.mj 

- i[n“(n- 

— 371 + 3)(/ii ~ 

- 4/13.i) 

+ 

371® (271 - 

- 3)j«2,2 

(1.29) 

71* 












+ 371«’(2>1! 


VlJmj 


— 271 + 2)(m6 - 

- 5/14, 1 ) 

+ 

107l'®(7l - 

" 2)iU3.J 


+ 10 n®(w + l)(n — 4 )/j 8 ,i 2 — 30n'“’(n — 2 )^ 141,1 
— 10 w^*( 3 n — 4 )^* 2 ,i> + 4 n'*Vn]. 


(1.30) 
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Vime = - 5n^ + lOn’ -m + 5)((i, - ^ 5 , 1 ) 

+ - 4n“ + 7n - 5^2 - 10n^'^(2n‘ - 6n + 5 ) 11,2 

(1 31) + ISn'^^Cw' -4n^ + 6n- 5 )mu^ - 60n^'^(n" - 4n + 5 )/xsm 

+ 15n''’(3w - 5 )m2» - 20n^*^(n^ - 3n + 5)^s.i^ 

+ 45n'''(2tt - 5)/t2M2 + 15n“'(n - 5)/z2,i. - 5n'®Vi«]- 

(1.32) '/i2:mi = — ['n!'^\n — l)(/i4 — 4;i3.i) + + 1 )m2,2 — ‘n}^\2ti2,x^ — mh)]. 

Vsmj = ■" + 3n®(n — l)(n® — 2n + 5 )/ 24 ,s 

— 2tt^^^(3r!.^ — 6w + 5 )jU 3,8 + n®(w^ — 3?2.^ + 9?i — 15)]U2 j 

(1.33) - 3n''^(n - l)(w - 5 )m4,i^ - 12n®(n* - 4n + 5)ju3,2,i 
+ 4w''\3ra - 5)/i3,i4 - 3ft''‘^(n=‘ - 6n + 15 )m2».i» 

+ n**(3/i2,i« — Mii«)]' 

V 2 'Sa = ^[n®(n “ l)'(n - 2)(|t.e - 6^3, 1 ) - 3n®(n - 2)'(2n - 5)/44,2 

71 

+ n®(n - 2)'(n' - 2n + 10)^,.3 

(1.34) - 6n'"(n - 2)(fj' - 6n + 20 )k3,2,i + - 2)(7n - 10 )m4,i2 

+ 3n‘”(37i* - 12n + 20 );i 2 . + 4n^*\n - 2)(n - 10 )m 3 .h 

+ — 8» + 20)^2242 — 4 w**^( 3 m 2 ,i‘ ~ Mi')]- 


6. The variance of the variance of samples of n. The variance of the variance 
of samples of n, when the moments of the universe are measured about a fixed 
point, is defined as 

(1'35) 'Wimj = Wiiiz “ YitllmS- 

Therefore, from (1.27) and (1.32), 


(1.36) 


V2:mj — ;r4[^^^^(w ~ 1 )(m 4 ~ 4/i»,i) + w®(n — 1 )m2,2 “ 7i'*\2y.2,\2 — /ii*)] 



(m2 - Mi,i)“. 


Tchouproff [4; p. 492] gave a formula (8) for the variance of the sample 
variance but his result is unwieldy due to the fact that moments of the universe 
are measured about the mean. 
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7. Conventional infinite sampling formulas derived from generalized sampling 
formulas. The term “infinite sampling" is to be interpreted as meaning- 
sampling from an unlimited supply or sampling from a limited supply with repeti¬ 
tions permitted. In each of these situations the variates are independent [5; p 79], 
First, it is assured that the n populations are identical, that is, iX = 2 ^ = ■. ■ 
= „X. This assumption results in the fact that, for a fixed t, un = lUi = ■ ■ • = 
npt and iM( = = ••=«/!(. Therefore, under the assumption of identical 

populations, every moment may be interpreted as either the moment of n identi¬ 
cal populations or as the moment of a single population. The only other as¬ 
sumption is that the sampling is “mfinite”. 

From the condition of independence [3; p. 141], we have 


= (E r.xllj . . . (ErAl). 


Therefore, 


firs — Titilirifili • ' r,,Pty • 


Combining the condition of independence with that of identical populations, we 
have 

(1.37) **1^2* ‘Up riMn f2f^H * * - ” Pti ' * * Pty 


By (1.16) and (1.37), we may write 

(1.38) *2v ' 

Since the only terms of the generalized sampling formulas ' , e affected 

by the assumption of “infinite sampling" are those of the form , the 

problem of obtaining conventional infinite sampling formulas from generalized 
sampling formulas is, in practice, a mechanical one. Simply write terms of the 
form fitii, which appear in a generalized sampling formula, as ptiPi, ■ • pty 
and one automatically obtains the corresponding infinite sampling formula. 

As an illustration of the method, consider the generalized sampling formula 
(1.36) for the variance of the sample variance. When (1.38) is utilized to change 
it into the corresponding infinite sampling formula, (1.36) becomes 
(2) 

(1.39) %;mi = ^ [(-a — l)(;n ~ 4ii3;ii) — (n — 3)m2 H- 2(2ti — S)(2innl — 1 * 1 )]. 

which is the usual formula [20; p. 75] for the variance of the sample variance 
when the moments of the universe are measured about a fixed point. If it is 
assumed that the moments of „C/w arc measured about the mean, formula (1.39) 
becomes 

(S) 

(1.40) i32‘mi = - («■ - 3 )^ 12 ], 


which was published by “Student” [1; p. 3] in 1908. 
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8. Conventional finite sampling formulas derived from generalized sampling 
formulas. The term “finite sampling” is to be interpreted as meaning: sampling 
from a limited supply when repetitions are not permitted. 

In order to reduce generalized sampling formulas to the corresponding formulas 
for finite sampling, the assumptions are made that the n populations are identical 
and that N and n are finite, N > n. The selection of variates which enter each 
sample is restricted in the following manner. If a variate having a given post¬ 
subscript IS chosen, then no other vanate having the same post-subscript may be 
chosen for the same sample. 

Now it is evident that terms of the form - a must be redefined on the 
basis of the preceding assumptions From the expansions [20; p. 32] of power 
product sums in terms of products of power sums, we get the formulas for 
which are given in the following tables. 

The formulas in the tables of this section are called transformation formulas for 
finite sampling or more briefly transformation formulas. 

The transformation of generalized sampling formulas into corresponding 

finite sampling formulas is illustrated by the substitution of —~ w.i 

in (1.27). We get 

(1.41) \m - A 

which is the well-known finite sampling formula for the mean of the variance of 
samples of n. 

From this and the preceding section it is evident that the generalized sampling 
formulas may be considered as formulas for either infinite or finite sampling 
depending upon the interpretation given to terms of the form pti ^ ■ u 

9. Transformation of infinite sampling formulas into corresponding finite 
sampling formulas. It is a well-known fact that infinite sampling formulas may 
be obtained from those for finite sampling by letting the size of the parent popula¬ 
tion become infinite. But, prior to this paper, apparently no one has presented a 
method of obtaining finite sampling formulas from infinite sampling formulas 
However, by making use of the relations between finite, infinite, and generalized 
sampling, we shall demonstrate that it is possible to transform any infinite 
sampling formula into the corresponding finite sampling formula. 

Since the infinite sampling formulas are obtained from the generalized sam¬ 
pling formulas by replacing 

Ptiti e, by ptiPii 

it follows that generalized sampling formulas may be obtained from the infinite 
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TABLE II 
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formulas liy replacing 
(1.42) 

However, it must be emphasized that the application of (1.42) demands formulae 
which are expressed in terras of moments of sample moments rather than central 
moments of sample moments (although the sample moments may be measured 
about a fixed point or about the mean) and the moments of the universe must be 
mea.sured about a fixed point The reason for these restrictions is to insure that 
each term is accounted for individually 

After replacements (1.42) are made in the formula for .sampling from an 
infinite population, the resulting formula is the corresponding generalized one. 
The step to the corresponding finite sampling formulas is simply the one outlined 
in section eight, namely, the use of the transformation formulas. 

We shall consider, as the first illustration, the infinite sampling formula for 
the mean of the sample variance when the moments of the parent population are 
measured about the mean. The formula is 


(1.43) 


n — 1 _ 

/u m . 


When (1.43) i.s expressed in terms of moments of the parent population about a 
fixed point, we have 

(1 44) - t*l] 

Following (1 42), g? is replaced by gi,i and (1.44) becomes (1 27). The use of 
the transformation formula for gi.i gives (1.41) which, when the moments of the 
parent population are measured about the mean, becomes 

(1.5) 

Infinite sampling formulas expressed in terms of moment-function, may be 
similarly transformed into the corresponding finite sampling formulas. For 
example, Craig [9, p. 57] gives the second Thiele seminvariant of the variance 
of samples as 

(1.46) -b 2 

n' 


First, we expro.ss (1.40) in teims of moments about a fixed point by use of the 
formulas relating Thiele semiiivariants and moments [9; p. 12]. We also recall 
that the re.sulting formula should be expressed in terms of moments of sample 
moments rather than in term,s of central moments of sample moments We 
obtain 


(1.47) 


M2 n»o — 


(n - 1) 


[(n — l);i 4 — 4(n — l)/i3/ai -b (n° — 2n, -b 3)fil 


— 2{n — 2)(n — 3)/i2/ii + (n — 2)(n — 3)mi]. 
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The next step is to transform (1.47) into the corresponding generalized sampling 
formula by use of (1.42) We obtain (1.32). Since we desire to obtain the 
finite sampling formula which exactly corresponds to (1 46), it is necessary to 
transform (1.32) from the second moment of ma to the variance of and we get 
(1.36). Next the transformation formulas are applied to (1.36) When the mo¬ 
ments of the parent population are measured about the mean and are replaced 
by Thiele seminvariants, (1.36) becomes 

(1.48) n^iN - DKN - 2){N ^ ^ 

+ 2{N\ - ZNn - SAT + 3n + 3)X2]. 

Formula (1.48) gives the second Thiele seminvanant of the variance of samples of 
n drawn from a finite parent population of N When W -> », in (1.48), we 
obtain immediately (1.46). 

It is generally true that infinite sampling formulas are more easily derived than 
arc the correspondmg finite sampling formulas. The methods of this section 
make it possible to derive the desired sampling formulas for the infinite parent 
population and then transform these infinite sampling formulas into the corre¬ 
sponding finite sampling formulas. 


II. Moment Function Adjustments for Grouped Data 


A given distribution of discrete variates may be grouped in “fc groupings of fc”. 
Wo desire to find the correction which eliminates the error made in replacing a 
given moment of the original distribution by the average of the corresponding 
moments of the k grouped-distributions. 

Formulas for the adjustments for moments of a grouped-distribution of 
discrete variates were first given (without proof) in the Editorial of Vol. I, No. 1 
of the Annals of Mathematical Statistics. Later, more satisfactory derivations 
of adjustment formulas were given by Abernethy [24] Craig [25] and Carver [26]. 
However, it was observed by Carver [26; p. 162] that the developments of 
Abernethy and Craig are adjustments about a fixed point and that they fail to 
hold for the case of expectations of central moments if we accept the definition 


_ 1 - 
IH.itt — T Zj rM« j 
K 


(f = 2^3, •••). 


Here rfit represents the tth central moment of the rth grouped-distribution. The 
formula for the true value of was supplied by Carver [26; p. 162] but he did 
not indicate a general method which might be used for the derivation of , 
(«> 2 ). 

A distribution of discrete variates grouped in “k groupings of k” is a special 
case of a universe of n finite populations and hence the methods and formulas 
for the expectations of population moments are applicable to our present 
problem. 
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It is found that the adjustment formulas for moment-functions of grouped 
data involve central moments of a rectangular distribution. It will be con¬ 
venient for our present purposes to give a brief treatment of the moment-func¬ 
tions of a rectangular distribution. 

1. Moment-functions of a rectangular distribution. Consider the rectangular 
distribution of discrete variates, 

^2 1) h, 2h, 3?i, • • , kh. 

It is readily shown that the moment generating function of (2.1), 

6 " 

(2.2) Gz(e) = Mo + Ml® + M4 ^ + • • * + ‘ 


may be written 
(2.3) 




sinh ^khe 
k sinh ^h6 


Setting the expansion of the right member of (2 3) equal to the right member of 
(2.2) and equating coefficients of like powers of 6 , we obtain the following recur¬ 
sion formula for the moments of (1.1) 


(2.4) 




+ (- 1 ) 


i-i (n + Lf-i 


rl 


— h’' ^ ‘ ‘ — k h , 


where Mn r represents the nth moment of a rectangular distribution. Formulas 
for Mmr , (n = 0,1, ■ ■ • , 10) are given below. See Sasuly [27; p. 27]. 


Mo s — !■ 

Mi.« = + 1)^- 

M2H = + l)(2fc + 1)^' = K2fc + l)^Mi.R ■ 

Ms.® “ -|- 1) ” kh Mi:R • 

MiiR = - l)?i“ M2.R. 

(2-5) = |(2/c^ -h 2fc -h l)li^ MiiR • 

M8.R = ^(3fc^ + — 3fc -f 'i)h* MS;« ■ 

M7,B = |(31e* -h efc* - /c” - 4/b -f 2)h* m5:r ■ 

M8.R = TV(5it“ + 15^^ + - A;' -f 9/c - 3)h' m*:r • 

M6;r = K2A:“ + 6A;® + k* - 8fc^ + k^ + &k - 3)li* ms r • 

M10.R = TT(3fc" + 12A:’ + 8A;“ - 18*' - 10*' -f 24*' -|- 2fc® - 15* -|- 5)*' ms.r ■ 
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The deviations about the mean of (2.1) are 


(2 6) - l)?i, - 3)h, . , ^(k,- 3);i, 

^Therefore, 

(2.7) M2n+1 H = 0. 

•If we denote (2.6) by x, we have 


Kk - l)h. 


( 2 . 8 ) 


a fa\ - sinb K^W) 
'-^sinhK^e)' 


The recursion formula for central moments of (2.1) is 


(2.9) 


(2n 4- 1) 
1 ! 


(1) 


M2n R I 2^ 


(2n + 1)® . 


3! 


M2p- 2 fl + • • • 

V (2n + 1)''+'' 
'^2^ (r+1)' 


jiin-rK + 


22" 


, 5) are given below. See [27; p. 27]. 


( 2 . 10 ) 


Foimulas for /Ijn r , (n = 0, 1, 

fk-R = 1, 

fij B = ~ l)^^t 

M4 fi = TV(3fc'* ~ 7)h!‘iii,R, 
fie-B = Tk(3fc' - 18fc' + 31)fc*fe.R, 

WB = irfirOfc" - 55A;' + 239fc* - 381)ft®M2B, 

Mio'b = WTF(3fc® - 52fc' + 410fc* - 1636/c^ + 2565)fiV2fi. 

From the relation which connects Thiele semmvariants and the moment 
generating function, we get, see [25; p. 57], 

( 2 . 11 ) 


X27h-i;« ~ 9, 
- 1) 


/ .in+l5„/l^”(fc“" .. „_10 0 

Aati.B = (- 1) - ^ -, w = 1, 2, 3, 

where Xn s represents the nth Thiele serainvariant of a rectangular distribution 
of discrete variates and (n = 1, 2, ■ ■ ■), the Bernoulli numbeis. |, fV) ■ • ■ 
In each of the cases considered in this section, corresponding formulas maybe 
found for a rectangular distribution of continuous variates by setting h = m/k 
(which makes the range m with k subdivisions) and then letting —> <» 

2. Adjustments for moments. As our basic distribution we consider the set of 
discrete variates, x,, (t = 1, 2, ■ • , N), where some of the Xi’s may not be 
distinct. We assume that the given distribution is grouped in "k groupings 
of k». 
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When X, is placed in the ? th position of a class, the limits of the class are 
X, - (?• — l)/i and X, + (k — r)^ and the class mark is a:, + ^ - 

Thus, when the class mark is used as the value of a:,, the quantity j /[ 

is added to the true value of a:.. Therefore, when the expected value of a 
particular moment for "k groupings of k" is found, each variate has made a 
definite contribution as it was placed in each of the k positions of a class. 

For convenience, we define 


( 2 . 12 ) 




fc — (2r — 1) 


h, 


(r=l,2,...,k). 


As was previously indicated, the expected value of a given moment involves 
the contribution of each variate as it occupies the k class positions. A con¬ 
venient method of finding these contnbutions is by means of a universe ^t/,, 
which IS composed of the populations rX, (j = 1, 2, ■ ,k). The ?’th populatipn 
consists of the values of the variates when they occupy the rth position of the 
class. Hence rX consists of = a:, -|- Cr, (* = 1, 2, • , N). 

The notation for moments is the same as that of Part I. Since a. [/a- is of the 
same form as the universe studied in Part I, wc use the definitions (1.1) of that 
part. 

The expected value of the fth moment is 

#n(ii r 2 X{x{ -f CrY 

K r-l 



Many devices have been used by previous writers [24; p. 269], [25; p. 57], 
[26; p. 157], to evaluate terms of the form t 2 e* However, it should be 

iC r“l 


noticed that the quantities e ,, (f = 1, 2, • • , k), are respectively identical 
with the deviations (2.6) about the mean of a rectangular distribution of discrete 
variates. It follows that 



k r*»l 


And since fiu+i k = 0, we have 

(2-13) Midi = 2 

Formulas for /ijs ^ , (s = 0,1, < • ■ ,6) are given by (2.10) 

If the class marks are selected as the unit of x, we set 7i = 1 in (2.10). If the 



FINITE POPULATIONS 


327 


class interval is chosen as the unit of x, we set h = l/k in (2.10). If k con¬ 
secutive values of the discrete variable are grouped in a frequency class of width 
m, we put k = m/k in (2.10). 

Usually we desiie to estimate the value of the moments that would have been 
obtained if we had not grouped the data. Therefore (2 13) is solved for the 
moments of the imgrouped data. We have 

tl/21 / . \ 

(2.14) Mi = £ 

wherein 

P _ ^ (- i)'’(25)I p \«■ ■ ■ aV,,., 

[(2p,)!r'[(2pit)!r • • • l( 2 pjir“ 2 r,! ... ,r„!’ 

the summation being taken for every po.ssible product of moments for which 

Pi ~ S, TTi = p. 

i*.l 1=1 

Formulas, correspondmg to (2.13) and (2.14), for a distribution of continuous 
variates are written by replacing the moment symbols for discrete variates by 
those for continuous variates 

3. Adjustments for central moments. Consider the universe U which consists 
of the population rX, (r = 1,2, • ■ ■ ,k), where rX is the )'th grouped-distribution. 

The expected value of the tth central moment of the k grouped-distribution is 
given by (1.3), (1.4) and (1.5) of Part I, where now pi.,,,-. is given by (2.13) of 
the preceding section. Thus, the development of this section is identical with 
that of section one of Part I with the single exception that pi-,,, = pi no longer 
holds but is replaced by pi= p( -f a correction. Therefore, the formulas for 
the adjustments for central moments may be obtained immediately from the 
formulas derived in section one. Part I, if the corrections of the preceding section 
are inserted. We have 

(2 15) pi= P2 + P2-fi — p2-,ii 

(2.1G) Pi= P3 + 6piP2 - 3pii:„,„j + 2p3 

(2 17) Pl^4 = P4 -t- 6p2P2.B + Pl:R + 6(p2 — 2p* -|- P2 b)p 2 ,ii 

+ 12pipii-,.iMj — 12piP3,n — 4pii.,,„,3 
4" 6p21.MiM2 3p4-,,, 

The moments of the ungrquped data can be obtained readily from formulas 
(2 15) through (2.17). 

Adjustment formulas for central moments of a distribution of continuous 
variates may be obtained from (2.13) by replacing the moment symbols for 
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discrete variates by those for continuous variates and taking the moments about 
the mean Also, it may be observed that adjustment formulas for central 
moments of a distribution of continuous variates may be obtained from formulas 
(1.3), (1,4) and (1 5) of Part I, provided the moment symbols are exchanged as 
indicated above and terms of the form P.isj set equal to zero 

4. Usual adjustments for Thiele seminvariants. The usual adjustments for 
Thiele seminvariants, for the univariate discrete population, may be developed 
directly by use of one of the fundamental properties of Thiele seminvariants, 

It is assumed (see [25; p. 55]) that k comsccutive values of the discrete variable 
are grouped in a frequency class of width m The k smaller intervals of width 
m/k = k go to make up the class width in, the actual points representing the k 
values of the variable being plotted at the centers of the sub-intervals Now, 
let us suppose that each of the k consecutive boundary points of the subintervals 
is as likely to be chosen as a boundary point of the larger intervals as any other 
Then, if x, is the class mark of the fth frequency class, for any true value, x, of 
the discrete variable included in this frequency class, we have 


•C, — X I Cf 


in which x and e, are independent variables and takes on the k values (2 12) 
with equal relative frequencies 1/fc. 

Since we have noted that the equally likely values which e, may take on are 
deviations about the mean of a rectangular distribution of discrete variates, we 
employ the cumulative property of Thiele seminvariants [9; p. 4] and obtain 
directly 

(2.18) = + (<=1,2, ), 

where Xj'j, is the tth seminvariant computed from the grouped data, is the 
ith seminvariant computed from the ungrouped data and Xj «is defined by (2,11). 

Formulas corresponding to (2,18), for special values of t, are given by Craig 
[25; p 57], However, the present development indicates the dependence of 
adjustment formulas on central moments of a rectangular distribution and pro¬ 
vides a general formula for these adjustments which is expiessed completclvm 
terms of Thiele seminvariants. 


6. New adjustments for Thiele seminvariants. If we accept the definition 



it = 2, 3, ...), 


then (2.18) is at best only an approximation formula. We now desire exact 
formulas for m x, for the case of a grouped-distribution of discrete variates 
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First (1.9) IS used and terms of the form are evaluated in terms 

of central moments by (1.3). Then terms of the form are evaluated by 
(2.13) and finally the relations between moments and Thiele seminvariants are 
employed Exact formulas for the expected values of the second, third, and 
fourth Thiele seminvariants for groupcd-distributions of discrete variables are 


given 

below. 





(2 19) 

Ml 

= 

Xj Xj B 

fii in • 


(2.20) 

Ml Xa 

= 

Xa + GXi/Ia.,,, — Sfiu 

"b /i, • 

(2.21) 

Ml Xi 

= 

Xl + X4,B 

+ 12[X2 - 

■ 2Xl Xj r]/I; 



+ 

24[^ii.,,,,,3 

~ fia ,i,]Xi 

~ i/Zii 



+ 

Hfiii 

- SMCxi - 

3/14 . 


Formulas for Thiele seminvariants of ungrouped data in teims of expectations 
may be obtained from (2.19) through (2.21). 

Adjustment formulas for Thiele seminvaiiants of a distnbution of continuous 
variates are given by Langdon and Ore [23; p. 231] and Craig [25, p 57]. If we 
denote the ith Thielo scminvariant of a distnbution of coiitmuoiis variates by 
Lt , then 

(2.22) n i, = + L( B , 

where 

(2.23) L2t+iR = 0, Iju.r — ^^—, t = 1, 2, • ■ • . 

Formulas (2.19) thiough (2 21) may be used for continuous variates by 
changing the moment .symbols and .setting terms of the form p, 

equal to zero 

6. Adjustment formulas applied to a numerical problem. We consider the 
arbitrary distribution given m Tabic III. 


TABLE III 

An Arbitrary Distribution of Discrete Variates 


V 

/ 

V 

/ 

V 

f 

F 

1 

Hi 


30 

7 

1 

2 + 30 + 1 = 33 

2 

Hi 


4 

8 

1 

8 + 4 + 1 = 13 

3 

10 


3 

9 

1 

10 + 3 + 1 = 14 
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The three grouped distributions, when the variates are grouped in "groupings 
of three," appeal in Table IV. 

TABLE IV 


Distributions Derived from Data of Table III by Making the Three Possible Groupings of Three 


(1) 

(2) 

(3) 


Class 

f 


/ 

Class 

/ 

1-3 

20 

HB 

10 

-1 to 1 

2 

4-6 

37 

3-6 

44 

2-4 

48 

7-9 

3 

6-8 

5 

5-7 

8 

10-12 

0 

9-11 

1 

8-10 

2 


Using the fixed point 4, moment-functions are computed for the distribution of 
Table III and for each of the distributions of Table IV. These quantities 
along with the average of each moment function appear in Table V, 

TABLE V 


Moment-Functions of the Distributions of Table III and Table IV Averages of Moment- 
Functions of Distributions of Table IV 


Dist 




Ml 

/*! = Xj 

jug ” Xg 

Ml 

X, 

(1) 

9 


' 69 

1125 

9819 

-17442 

238,849,317 

-60,388,966 

60 

60 

' 60 

1 


(60)2 

1 

(60)2 


(60)* 

(2) 


171 



10179 

667162 

557,840,277 

247,004,154 

60 

60 

60 

60 



(60)2 

(60)* 

(3) 

-30 

162 

138 

1938 

8820 

1317600 

628,282,000 

294,904,800 

60 

60 





(60)^ 

(60)* 

Ave. 

-10 

166 

96 

1868 

9606 

622440 

441,657,198 

163,839,996 

60 



60 

(60)2 




Orig 

-10 

126 

116 






Dist 

60 

60 

60 

60 

(60)2 

(60)2 

(60)* 

(60)* 


Table VI gives the expected values of the moment-functions as obtained by 
substituting from Table V into the formulas of sections two, three, and five. 
Also the expected values, computed from the usual formulas, are given and the 
errors which would be made, if the usual formulas were used, are indicated. 
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TABLE VI 


Mxpecled Valves of Moment-Funelions Computed by Formulas 


E\pectation8 by 




^1.(1, 

**1:^2 — 

"ix. 



Now Formulas 

-10 

166 

96 

1868 

9606 

622440 

441,657,198 

163,839,996 

60 

60 



(60)» 



(60)^ 

Usual Formulas 

-10 

166 

96 

1858 

9860 

642400 

416,778,000 

133,795,200 

60 

60 





(60)^ 

(60) ‘ 

Eiror 
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-24,879,198 






(60)^ 



(60) ^ 


7. Evaluation of . It appears at first that it is necessary to form the 
“k groupings of k” in order to evaluate the term which enters the precise 
formula for the expected value of the variance. That was the procedure fol¬ 
lowed by Carver [26; p, 161]. However, it is possible to evaluate from the 
ungrouped data without forming a single grouped-distribution. 

By definition, 

W.ci = r S IrMl - ^ll]^ 

fC r«l 

where rin is the mean of the rth grouped-distribution and mi is the mean of the 
ungrouped distribution. We wish to study the terms rMi and mi ■ Consider a 
set of variates a;,, (z = 1, 2, • ■ • , s), with corresponding frequencies/,-, {i = 1,2, 

■. ■ , s). The x’s are subject to the condition, x, — x,_i = 1, and consequently 

JjXf 

some of the f’s may be zero. The mean of this distribution is ~. 

Wc define 

F, = f, /(,+, + fik+i + ' ■ , (^ = If 2, ■ ,k) 

Then, if a grouped-distribution is formed with x, in the zth (z = 1, 2, ■ ■ ,k) 
position of a class, the mean of this grouped-distribution is 

k 

+ 2 F,e,+,^i 

j»i 

ST 

where e,_i = if e, = 1 and e,+i = ei if «. = e*.. Similarly if a grouped-distribu¬ 
tion is formed with x, in the (z‘ l)st position of a class, the mean is 

h 

E/ 
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Thus, it IS evident that, given the expression for the mean of any giouped- 
distribution in which .t, is in the ith position of a class, we may form the expres¬ 
sion for the mean of the groupcd-distribution in which is in the (i -(- l)st 
position of a class by a cyclic permutation of the c's of the given expression 
Therefore, it follows that if we call r/ij the mean of the grouped-distribution 
in which a;, is in the rth (r = 1, 2, - • , A;) position of a class, then 

k 

rHi ~ Hi — ~ ■ I A;). 

If we define 

k 

N = 2 / and 4>r = H PiSr+i-i 
1=1 

then, 

Thus, it IS evident that fii is a function of the frequencies oi the variates and 
of the Cl’s. The fact that the values of the variates do not enter permits 
one to quickly calculate its value. 

Consider jit for the distribution of Table III. AVe find 

<bi — 33ei + IScj -f- 14 c3 . 

Then, by successive cyclic permutations of the e,’s, 

4>i = 33 e 2 -|- 13eg 14ei, 

03 = 33e3 13ei 4" 14e2 • 

Substituting the lvalues Ci = 1, Cj = 0, eg = —1 we have 0 i = 19, 02 = 1 and 
03 = —20. Therefore, 

254 

which is identical with the value which was found when Table V was used. 

It follows from the preceding development that 

and if Fi = Fi = • ■ = F^ then is zero. 

8. Conclusion. The results of this paper include: 

1. The derivation of general and specific formulas for the expected values of 
population moment-functions. 
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2 The deiivation of generalized sampling formulas under the condition that 
samples of n arc formed by selecting one variate from each population 

3. Methods for the transformation of generalized sampling foimulas into the 
corresponding infinite and finite sampling formulas, 

4. A method foi the transformation of infinite sampling formulas into the 
corresponding finite sampling formulas 

5. A demonstration of the fact that adjustment formulas for moment-function 
of grouped data involve central moments of a rectaiigulai distribution 

6. A general formula for the expected value of the /th moment of grouped data 

7. New adjustment formulas for central moments of grouped data 

8. New adjustment formulas for Tliiclc scmiiivariants of grouped data 

9. A method for the evaluation of the term which appears in the precise 
adjustment formula for the variance. 

Many thanks arc due Prof. P S. Dwyer, to whom the writer is greatly in¬ 
debted for advice and encouragement 
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THE ANALYSIS OF VARIANCE WHEN EXPERIMENTAL ERRORS 
FOLLOW THE POISSON OR BINOMIAL LAWS 

By W. G. Cochkan 

1. Introduction. The use of transformations has recently been discussed by 
several writers [1], [2], [3], [4], in applying the analysis of variance to experi¬ 
mental dala where there is reason to suspect that the experimental errors are 
not normally distributed Two types of transformations appear to be coming 
into fairly common use; -s/% and sin~^ \/®- The former is considered appro- 
pnate where the data are small integers whose experimental errors follow the 
Poisson law, while the latter applies to fractions or percentages derived from 
the ratio of two small integers, where the experimental errors follow the binomial 
frequency distribution. In each case the object of the transformation is to put 
the data on a scale in which the experimental variance is approximately the 
same on all plots, so that all plots may be used in estimating the standard error 
of any treatment comparison. The extent to which these transformations are 
hkely to succeed in so doing has been examined by Bartlett [2], The object of 
the present paper is to discuss the theoretical basis for these transformations in 
more detail, and in particular to examine their relation to a more exact analysis. 

2. Experimental variation of the Poisson type. The hrst step in an exact 
statistical analysis of the results of any field experiment, is to specify in mathe¬ 
matical terms (1) how the expected values on each plot are obtained in terms of 
unknown parameters representing the treatment and block (or row and column) 
effects (2) how the observed values on the plots vary about the expected values. 
In this section, the variation is assumed to follow the Poisson law. 

The specification of the expected values requires some consideration. In the 
standard theory of the analysis of variance, treatment and block (or row and 
column) effects are assumed to be additive. In the case of a Latin square, for 
example, the expected yield mi of the fth plot, which receives the /th treatment 
and occurs in the rth row and the cth column is written 

(1) m. - + T. + fir + a 

where (? is a parameter representing the average level of yield in the experiment, 
and Ti, Rr and represent the respective effects of the treatment, row and 
column to which the plot corresponds. Since the T, R and C constants are 
required only to measure differences between different treatments, rows and 
columns, we may put 

(2) E = E fir = E G. = 0. 

I r c 
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If the expeiimental errors are normally and independently distributed with 
equal variance, this specification leads to very simple equations of estimation 
for the unknown parameters, the maximum likelihood estimate of T,, for 
example, being the difference betw'een the mean yield of all plots receiving that 
treatment and the general mean. In addition to its simplicity, this type of 
prediction formula is fairly suitable for general use, because it give.'? a good 
approximation to most types of law which might be envisaged, provided that 
row and column difference,s are small in relation to the mean yield. Houevei, 
in considering an exact analysis with Poisson variation, the prediction formula 
IS assumed chosen, without reference to computational simplicity, as lieing the 
most suitable to describe the combined actions of treatment and soil effects 
The probability of obtaining a given set of plot yields x, with expectations m, 
may be written 

t x,\ 

Thus L, the logarithm of the likelihood, is given by 

(3) L = IZ (x, log m, - m,) - 2 log a;,' 

% t 

Hence the maximum likelihood equation of estimation for any parametci 0 
assumes the form 

(4) S 0 

nit dd 


where the summation extends over all plots whose expectations involve 6 . The 


function 


If 


will usually involve a number of parameters. 


Since the specifica¬ 


tion of row, column and treatment effects in a 0 x 6 Latin square require,s 16 
independent parameters, the solution of these equations may be expected to bo 
laborious, though it may be shortened by the intelligent use of iterative methods 
The problem of obtaining exact tests of significance is also difficult. The 
method of maximum likelihood provides estimates of the variances and co¬ 
variances of the treatment constants, which under certain condition.s can be 
assumed to be normally distributed if there is sufficient replication, but this can 
hardly be considered an exact “small sample” solution 
These remarks show that the exact solution is somewhat too complicated for 
frequent use. The difficulty arises principally because the typical equation of 
estimation consists of a weighted sum of the deviations of the observed from the 

expected values, the weights being — . The factor i was introduced into 

TO, 89 m-i 

the weight by the Poisson variation of the experimental errors, and must be 
retained in any theory which claims to apply to Poisson variation It is, how¬ 
ever, worth considering whether some simplification cannot be introduced into 
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the equations by assuming some particular form for the prediction formula. 
This line of approach seems promising when one considers the simplification 
introduced into the “normal theory” case by assuming the prediction formula 
to be linear 

For Poisson variation, the linear law does not appear to be particularly suit¬ 
able, since it may give negative expectations on some plots (as happens m the 

numerical example considered in the next section) Further, while —* becomes 

80 

a constant, the factor — remains m the weight 
mi 

The entire weight can be made constant by assuming a linear prediction 
formula in the square roots and transforming the data to square roots. For a 
Latin square, this prediction formula is written 

(5) \/mi = a, = G -\- Tt Rr Cc, 
where 

(6) E = Z Jiir = Z Cc = 0 . 

t r c 


To find the maximum value of (3) subject to the restrictions (6), we may use the 
method of undetermined multipliers, maximizing 

(7) L + x(Zr,)+M(Zii:r)+KE c.). 

t i c 


The equation of estimation for a typical treatment constant Tj becomes 


( 8 ) 


,/a;, — inA dm, 
\ Wi / da. 


da, 

Wi 


-(- X — 0, i e 


2 (a-, — m.) 
y/m. 


+ X = 0, 


the rtimmation being extended over all plots receiving the treatment 
a, = V-L j lihen bv Taylor’s theorem 


(9) 


X, — m, = (a, — aj (a, 

da, 2’ 


V 2 d Till 1 
Cla; 


If 


If m, IS leasoiiably large, only the fiist teim on the right-hand side need be 
retained. When m, is small, we mai" u.sc, instead of the exact square root, a 
quantity a', defined so that 

(10) x, — m, = {a[ — a,) = 2 Vmi (®i “ 


Thii.s if the analysis is performed on the quantities a, instead of on the original 
data, equation (8) becomes 


Z — a,) + X — 0 

Tl 


(11) 
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On substituting the expectations for a, from (5), and using (6), \vc obtain 

(12) '£A{a[-G-Tt) + \ = 0 

Ti 

The corresponding equation for G is 


(13) 


Z 4(a: - G) = 0, 

i 


so that 0 is the general mean of the quantities a'. By adding equation(12) 
over all treatments, and comparing the total with (13), we find X = 0 Hence 
Ti is the difference between the mean yield of a' over all plots receiving Tt and 
the general mean of a'. In this scale the simplicity of the “normal theory” 
equations has apparently been recovered. Actually, the quantities’ o' are not 
known exactly, since 


(14) 


o' = a + 


(a: — m) 
2\/m 



where a is the expected value of ^/x. However, this process provides a means 
of successively approximating the maximum likelihood solution, by choosing 
first approximations to the quantities a, constructing the a'’s, solving for the 
unknown constants and hence obtaining second approximations to the expected 
values. The close relation of a' to V® is seen by remembering one of the 
common rules for finding square roots. This consists in guessing an approxi¬ 
mate root (a), dividing x by the approximate root, and taking the mean of the 
approximate root (a) and the resulting quotient (x/a) 

The suitability of the linear prediction formula in square roots must be con¬ 
sidered in any example in which the above analysis is being employed The 
law is intermediate in its effects between the linear law and the product law in 
the original data. My experience is that it is fairly satisfactory for general use, 
(cf. [2], p, 72) An exception may occur when it is desired to test the inter¬ 
action between two treatments, both of which produce large effects. In this 
case the definition chosen for absence of interaction may not coincide at all 
closely with the definition implied in using the linear law in square roots. An 
example of this case was given in a previous paper [1], 

In this connection it should be noted that an approximate “goodness of fit” 
test may be obtained of the validity of the assumptions made. Since the quan¬ 
tities a, enter into the equations of estimation with weight 4, the quantity 
4 Z (tti ~ “0* is distributed approximately as with the number of degrees 

i 

of freedom in the error term of the analysis of variance Some idea of the 
closeness of the approximation may be gathered by considering the simplest 
case in which only the mean yield is being estimated. In this case the observed 
values X are assumed to be drawn from the same Poisson distribution, and the 
sufficient statistic for the mean G is known to be 2;(r»)/n. Since, however, the 
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prediction formula is here the same m square roots as in the original scale, and 
since the maximum likelihood sol ution is invariant to change of scale, the mean 
value a of a' must be cxacily ■\/'2,{x)/n, as the reader may verify by ivorking 
any particular example. Thus S4(a' - a)'' is found to be the 

usual X test for examining whether a set of values x may reasonably be assumed 
to come from the same Poisson distribution. Bv working out the exact distri¬ 
bution of S(a:, — x)^/x in a number of cases [5], I previously expressed the 
opinion that this quantity followed the y" distribution sufficiently closely for 
most practical uses, even for values of the moan as low as 2 This opinion has 
since been substantiated by Siikhatme, [6] who sampled this distiibutioii for 
m ^ 1, 2, 3, 4, and 5 

A high value of x means cither that the picdiction formula is not satisfactory 
or that the experimental errors arc highei than the Poisson distribution indi¬ 
cates, or that both causes arc operating These effects can sometimes be sepa¬ 
rated by examining whether the observed yields deviate from the expected 
yields in a systematic or a random manner If the deviation is sy.stcmatie, the 
prediction formula is probably unsatisfactory. 

The type of approach used above resembles in many features the “exact" 
analysis for the probit transformation [7]. The principal difference is that in 
the case of probits the transformation is made to suit the a prion prediction 
formula, which postulates that the probits are a linear function of the dosage, 
or of the log (dosage). Thus with probits the equations of estimation still 
involve weights in the tiansformcd scale These do not seriously complicate 
the analysis, since only two parameters require to be estimated for a given 
poison. With, however, the much greater number of parameters usually in¬ 
volved in specifying the results of a field experiment, the attractiveness of a 
solution which does not involve weighting is greatly increased 

3. Numerical example of the square root transformation. A 5 X 5 Latin 
square expenment on the effects of different soil fumigants in controlling wire- 
worms was selected as an example. The average number of wiieworms per 
plot (total of four soil samples) was just under five. Previous studio,s [8], [9] 
have indicated that with small numbers per sample, the distribution of numbers 
of wireworms tends to follow the Poisson law. 

The plan and yields are shown in Table I. The first two figures under the 
treatment symbols are the numbers of wireworms and their square roots respec¬ 
tively,* the latter being regarded as first approximations to the values,a'. Two 
of the plots receiving treatment K gave no wireworms. Since these plots are 
likely to be changed most in the transition from square roots to a', better 
approximations were estimated for them before proceeding with the calculations 
The best simple approximations appeared to be obtained from the square roots 
of the means in the original units. For the plot in the second row and second 
column, the square roots of the row, column and treatment means in the original 
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TABLE I 


Plan and number of wireworms per plot 


p 

0 

N 

K 

M 

Mean 

31 

2 

5 

1 

4 


1.732 

1.41 

2.24 

1.00 

2.00 

1.676’ 

1.763 

1 45 

2.25 

1.11 

2.00 

1.714’ 

1.77' 

1.46 

2.25 

1 10 

2.00 

1.716“ 

M 

K 

0 

N 

P 


6 

0 

6 

4 

4 


2.45 

(0.39) 

2.45 

2.00 

2.00 

1.858 

2 45 

0.32 

2.50 

2 02 

2.02 

1.862 

2.46 

0 32 

2.49 

2.02 

2.02 

1 862 

0 

M 

K 

P 

N 


4 

9 

1 

6 

5 


2.00 

3.00 

1 00 

2 45 

2.24 

2.138 

2.10 

3 09 

1.00 

2.47 

2.25 

2.182 

2 13 

3.08 

1.00 

2.46 

2.25 

2.184 

N 

P 

M 

0 

K 


17 

8 

8 

9 

0 


4.12 

2.83 

2.83 

3.00 

(0 79) 

2.714 

4.18 

2.84 

2.83 

3.00 

0 77 

2.724 

4.17 

2.84 

2.83 

3 00 

0.77 

2 722 

K 

N 

P 

M 

0 


4 

4 

2 

4 

8 


2.00 

2.00 

1.41 

2.00 

2 83 

2 048 

2.14 

2.02 

1.49 

2.04 

2.92 

2.122 

2.10 

2 03 

1 50 

2.05 

2.90 

2.116 

Mean 2 460^ 

1.926 

1.986 

2.090 

1.972 

2.087’ 

2.526’ 

1.944 

2.014 

2.128 

1 992 

2.121’ 

2.526“ 

1 946 

2.014 

2.126 

1.988 




Treatment Means 



K 

P 

0 

M 

N , 


1.036’ 

2.084 

2.338 

2.456 

2.520 


1.068’ 

2.116 

2.394 

2.482 

2.544 


1.058“ 

2.118 

2.396 

2.484 

2 544 


' Original numbers 

’Square roots. “Second approxim.ations ■‘Third approxima- 


tions. 
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units are respectively 2.000, 2 145 and 1 095, and the square root of the general 
mean is 2 227 Hence 

o' = M2.000 + 2.145 + 1.095 - 2(2.227)] = 0.39. 

The ottier zero value was similarly found to give a' = 0.79. The corresponding 
estimates from the means of the square roots were considerably too low, since 
the o' values tend to bo higher than the square roots The use of “missing plot” 
technique gave very poor approximations, because it ignores the fact that the 
plots in question had zero yields 

With the estimated values inserted, the row, column, and treatment means 
of the square roots are as shown in Table I A second approximation to o' 
was calculated for each plot For the plot in the first row and the first column, 
the expected yield is 

q; = 1.676 + 2.460 + 2 084 - 2(2 087) = 2.046 

Hence o' = ^(2.046 + 3/2.046) = 1.76. These values constitute the third set 
of figures in Table I. Theoretically, it is advisable to readjust the row, column, 
and treatment means after each new value of o' has been obtained, in order to 
secure rapid convergence This is rather laborious in practice, and a complete 
set of new plot values was obtained before readjusting the means, The third 
approximations obtained by this method are shown in the fourth lines in Table I 
and are correct to two decimal places 
It is noteworthy how closely the square roots agree with the third approxi¬ 
mations on all plots except those which originally gave zero yields. The differ¬ 
ences between the second and third approximations are trivial 
The next step is to make a test by means of the quantity 42 (a' — Q:)^ 
From the manner in which the values a are constructed from the a'’s, it follows 
that 2 (a' — af is simply the error sum of squares in the conventional analysis 
of variance of the values a'. The analysis of variance of the third approxi¬ 
mations is shown in Table II. 

TABLE II 


Analysis of variance of adjusted square roots 



Degrees of freedom 

Sum of squares 

Mean square 

Rows 

4 

2.9815 


Columns 

4 

1 1190 


Treatments 

4 

7.5815 

1.8954 

Error 

12 

4.5970 

0.3831 


The value of x^ is 4 X 4 597 = 18 39, with 12 degrees of freedom, which is 
just about the 10 percent level. If the hypothesis is regarded as disproved 
only when x^ exceeds the 5 percent level, the treatment means may be tested 
by regarding them as approximately normally distributed with variance 
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1/5 X 0,25 = 0.05. It IS, however, more prudent to use the actual error mean 
square as an estimate of the experimental error variance, performmg the usual 
tests associated with the analysis of variance. This may be justified on the 
grounds that the calculations have produced a set of plot values a' of equal 
weight. On this basis the standard error of a treatment mean is VO.SSSl/S = 
0.2768. Treatment K reduced the number of wireworms significantly below 
all other treatments, but there is no indication of any difference between the 
other treatments The treatment means may be reconverted to the original 
units by squaring. 


4. Experimental variation of the binomial type. In this case the yields are 
obtained by examining a constant number n units per plot and noting those 
which possess a certain attribute (e.g., plants which are diseased). Experi¬ 
mental variation is presumed to arise solely from the binomial variation of the 
observed fraction p possessing the attribute about the expected fraction P, which 
is specified m terms of unknown parameters representing the treatment and 
soil effects. 

If r, IS the number possessing the attribute on a typical plot, so that p, = r,/n 
the likelihood function takes the form 


n 


nl 

rj(n — r,)! 


pvQr' 


Hence the terms in the logarithm which involve the unknown parameters are 
given by 


(15) 1/ = 2 {r. log P, 4- (w - r,) log Q.). 

V 


■ The equation of estimation for a typical constant 6 is 

where the summation is over all plots whose expectations involve 9 
As in the Poisson case, an exact solution is laborious because of the weights 

Tlf dP 

The unequal weighting may be removed by transforming to the 

P I 

variate a, = sin~‘ \/pj, and a,ssuming that the prediction formula is linear 
in the transformed scale For a Latin .square the prediction formula is as.sumccl 
to bo 


(17) a. = G + Tt + R. + Cc 

where the tth plot receives treatment I and lies in the rth row and cth column. 
Further 


( 18 ) 


E = E Pr = a = 0. 
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Since Pi = sin^ a,, —* = 2\/P, Q,. A set of variates a. is defined so that 
da, 

on each plot 

(19) p, — P, = {a[ — a.) = 2-\/P, Q, — ai). 

With these substitutions, the equation of estimation for Ti , for instance, 
becomes 

(20) — a,) + X = 0 

Ti 

where, as before, X is an undetermined multiplier. The remaindei of the solu¬ 
tion proceeds exactly as in the Poisson case, T/ being found to be the difference 
between the mean value of o, over all plots receiving this treatment and the 
general mean of a[ A x test may be made with 2 4n(a( — a,)l 

t 

From (19) 

(21) a. = a. -t- 2-s/P^i ^ 2\/P^, ~ 

(22) = a. 4- ^ cot a, - g, cosec (2a,) 

where g, is the observed fraction which does not possess the attribute. The 
calculation of approximations to a, thus involves finding a predicted value a, 
from the treatment and block (or row and column) means, and using equation 

(22) . Tables [10] of the values of sin”^ VP* ; ^ cot a,, and cosec (2a,) 

have been prepared to facilitate the computations. It should be noted that 
these tables are in degrees, whereas the above equations assume that a, is 
measured in radians. In degrees, equation (20) above becomes 

(23) y, {a[ - a.) = 0 

^ ’ u 8100 ^ ' 

while 

180 

(24) «,' = «,+ — {2 cot a, — gi cosec (2a,) j. 

T 

As in the Poisson case, the appropriateness of the linearly additive law in 
equivalent angles depends on the way in which treatment and soil effects operate. 
As Bliss has shown [11], the effect of the transformation is to flatten out the 
cumulative normal frequency distribution, extending the range oyc" "'hich it 
can be approximated by a straight line. 

6. Numerical example of the angular transformation. The data were selected 
from a randomized blocks experiment by Carruth [12] on the control by me¬ 
chanical and insecticidal methods of damage due to corn ear worm larvae. 
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The control and the six types oi mechanical protection were chosen for analysis, 
the “yields” being the percentages of ears unfit for sale. The numbers of ears 
varied somewhat from plot to plot, the average being 36 5, but the variations 
were fairly small and appeared to be random. It was considered that varia¬ 
tions in the weight (in) could be ignored in solving the equations of estimation. 

TABLE III 


Percentages of unfit ears of corn 


Treatments 

I 

II 

Blocks 

III IV 

V 

VI 

Means 


42.4‘ 

34 3 

24.1 

39.5 

55.5 

49.1 


1 

40.6* 

35.8 

29 4 

38.9 

48.2 

44.5 

39.57' 


40.7' 

36.0 

29.4 

38.9 

48.6 

44.6 

39 70' 


23.5- 

15.1 

11 8 

9 4 

31.7 

15.9 


2 

29.0 

22,9 

20 1 

17.9 

34.3 

23.5 

24.62 


29.1 

23.1 

20.3 

18.2 

34 3 

23.5 

24.75 


33 3 

33.3 

5.0 

26.3 

30.2 

28.6 


3 

35 2 

35.2 

12.9 

30.9 

33.3 

32.3 

29.97 


35.5 

35.3 

14.5 

31 0 

33.4 

32.4 

30.35 


11.4 

13.5 

2.5 

16 6 

39.4 

11.1 


4 

19.7 

21.6 

9.1 

24.0 

38.9 

19.5 . 

22.13 


19.8 

21,7 

10,0 

24.4 

39 9 

19.6 

22.57 


14 3 

29.0 

10.8 

21.9 

30.8 

15,0 


5 

22.2 

32.6 

19.2 

27,9 

33.7 

22.8 

26.40 


22.6 

32.7 

19.2 

28,0 

33.7 

22.9 

26.52 


8.5 

21 9 

6 2 

16.0 

13.5 

15.4 


6 

17.0 

27.9 

14.4 

23.6 

21.6 

23.1 

21.27 


17.4 

28.2 

14.5 

24.0 

22.1 

23.2 

21 57 


16.6 

19.3 

16.6 

2 1 

11.1 

11.1 


7 

24.0 

26.1 

24 0 

8.3 

19.5 

19.5 

20.23 


24.3 

26.2 

28.8 

10.9 

20.1 

19.5 

21.63 

Means 

26 81* 

28.87 

18.44 

24.50 

32.79 

26.46 

26.31 


‘Percentage. * Equivalent angle ’Second approximation. 

The percentages of unfit ears, the equivalent angles and the second approxi¬ 
mations to o' are shown in descending order in Table III. The percentages on 
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individual plots vary from 2.1 to 55.5. The second approximations were calcu¬ 
lated from the block and treatment means of the angles For the control plot 
(treatment 1) in block I, for example, the expected value is 

39.57 -k 26.81 - 26.31 = 40.07. 

Since Fisher and Yates’s tables of a -k | cot a and cosec (2a) are given for 
values of a from 46® to 90°, we take the complement of the expected value, 
which is 49.93 Interpolating mentally from the table, we find 

a -k 2 cot a = 74.0, cosec (2a) = 58.3 

Thus the second approximation to the complement of the angle is 

74.0 - 0.424 X 58.3 = 49.3. 

Hence the second approximation to a' is 40.7, which agrees very closely with 
the equivalent angle. 

On the majority of the plots, the second approximation differs by only a 
trivial amount from the equivalent angle. The plots with the three lowest 
percentages (2.1, 2.5, and 5 0) have increased somewhat more, and also one or 
two other plots where the angles deviated considerably from the expected values. 
A third set of approximations was not considered necessary. 

The analysis of variance of the second approximations is given in Table IV 


TABLE IV 



Degrees of freedom 

Sum of squares 

Mean squares 

Blocks 

5 

709.79 


Treatments 

6 

1,531.56 

255.26 

Error 

30 

982.67 

32.76 


Taking n as 36.5, the expected value of the error mean square is 820 7/36.5 = 
22 48. Thus = 982.67/22.48 = 43.71, with 30 degrees of freedom, which is 
almost exactly at the 5 percent level. This, together with the appreciable 
amount of the variance removed by blocks, indicates that the experimental 
error probably contains some element other than binomial variation. As in the 
preceding case, it would be wise to make the usual analysis of variance tests 
with the actual error mean square. 

6. Discussion. It must be emphasized that the solutions given above apply 
to the case where the whole of the experimental error variation is of the Poisson 
or binomial type. The methods are therefore likely to be useful in practice only 
where the experimental conditions have been carefully controlled, or where the 
data are derived from such small numbers that the Poisson or binomial variation 
is much larger than any extraneous variation. The test is helpful in deciding 
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■whether this assumption is justified. Further, the examples worked above 
indicate that the transformed values form very good approximations on most 
plots, It will often be sufficient to adjust only those plots which give zero or 
very small values in the Poisson case, or zero or 100 percent values in the 
binomial case. In this connection the method of adjustment given above may 
perhaps be considered as an improvement on the empirical rule given by Bartlett 
[13] of counting n out of w as (n — 1/4) out of n. 

Where extraneous variation becomes important, as is probably the normal 
case 'with data derived from field experiments, there seem to be no theoretical 
grounds for using the adjusted values. If we were prepared to describe accu¬ 
rately the nature of the variation other than that of the Poisson or binomial 
type, a new set of maximum likelihood equations could be developed. These 
would, however, lead to a different type of adjustment. 

The justification for the use of transformations has no direct relation to the 
Poisson or binomial laws in this case, or in cases where percentages are derived 
from the ratios of two weights or volumes, as in chemical analyses, or from an 
arbitrary observational scoring With percentages, for example, it may be 
said, without describing the experimental variation in detail, that the variance 
must vanish at zero and 100 percent and is likely to be greatest in the middle. 
The formula V = XPQ is at least a first approximation to this situation. The 
angular transformation •will approximately equalize a distribution of variances 
of this type, pro'vided that X is sufficiently small. We have, of course, returned 
to an “approximate” type of argument. It follows that the original data should 
be scrutinized carefully before deciding that a transformation is necessary and 
that any presiuned opinions about the nature of the experimental variation 
should be verified as far as possible. 

7. Summary. This paper discusses the theoretical basis for the use of the 
square root and inverse sine transformations in analyzing data whose experi¬ 
mental errors follow the Poisson and binomial frequency laws respectively. 

The maximum likelihood equations of estimation are developed for each case, 
but are in general too complicated for frequent use. If,' however, the expected 
yield of any plot is assumed to be an additive function of the treatment and 
soil effects in the transformed scale, a transformation can be found so that the 
equations of estimation assume the simple “normal theory” form. The trans¬ 
forms are closely related to the square roots and inverse sines respectively. 

The nature of the assumed formula for the expected values is briefly discussed, 
and a x* test is developed for the combined hypotheses that the prediction 
formula is satisfactory and that the experimental errors follow the assumed law. 

Numerical examples are worked for both types of transformation. These 
indicate that even for data derived from small numbers, the square roots or 
inverse sines are good estimates of the correct transforms on almost all plots, 
except those which give zero yields in the Poisson case, or percentages near 
zero or 100 in the binomial case. 
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In practice, these new methods are not recommended to supplant the simple 
tiansformations for general use, because it can seldom be assumed that the 
whole of the experimental error variation follows the Poisson or binomial laws. 
The more exact analysis may, however, be useful (f) for cases in which the plot 
yields are very small integers or the ratios of very small integers (it) in showing 
how to give proper weight to an occasional zero plot yield, 
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NOTES 

This section is devoted to irief research and expository articles, notes on methodology 
and other short iim.s. 


ORTHOGONAL POLYNOMIALS APPLIED TO LEAST SQUARE FITTING 
OF WEIGHTED OBSERVATIONS 

By Bradfobd F. Kimball 

1. Introduction. Let the independent variable be denoted by x, and let it 
range over n consecutive integral values to Xn Thus x represents the 
index-number of the ordered intervals at which observations are taken, where 
the intervals are all of equal length, and an index-number is assigned in con¬ 
secutive order to every interval within the range of investigation, whether ob¬ 
servations occur in that interval or not. Let y^ denote the observation measure 
(usually referred to as observed value), if such observation exists. Let denote 
the weight of that observation, with weight zero assigned where observations 
are lacking. 

To shorten the notation, summation over all values of x from Xi to x„ will be 
denoted by the sign 2. If a subscript and superscript is used, the context will 
indicate the variable to which the summation refers, The rth binomial coeffi¬ 


cient will be denoted by 


(:) 


A system of polynomials r = 0,1, 2, 3, ■ • ■ of degree r in x is said to be 
an orthogonal system, for the purposes of this paper, if they satisfy the relations 


( 1 ) 


Z Fx^r(a:)<f)a(x) 


1=0, r ^ s 

0, r = s. 

To construct the polynomials, one may write them in the form 

= /o(x) = constant 


( 2 ) 


r-l 


- /r(x) - Z 
*-0 


r = 1, 2, 3, 


where the hi are constants and the fr(x) are arbitrary polynomials of degree r. 
It then follows from the conditions of orthogonality that 


(3) 


, ^ Z wjrix)<l>,(x) 

Zio*Wa:)r 
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Thus when the polynomials fr{x) have been chosen for all r, the system of 
orthogonal polynomials for a given set of weights can be constructed and is 
uniquely determined except for a constant factor [1] 

By virtue of the relation (2) and the conditions of orthogonality (1), it follows 
that 

(4) Xw^[tt>r{x)f = 2w4r{x)4,rix). 

Define the function 4'(r, fc) by 

(5) 4>(r, k) = i:w4r{x)il>i.ix), r = 0, 1„ 2, 3, • . 

It follows from the relations (2) and (3) that 

(6) <t>r{x) = /r(x) - 2 

i-o 4 ft, t) 

where it is to be noted that this summation is independent of x. 

Define g, and Yr by 

(7) 3r = Xw^[<t>r(.x)f = 7^w4r(x)(j>rix) = ^(r, r), 

(8) Yr = ^Wtyrft>r(x). 

Then if Ur(x) represents the polynomial solution of degree r of the noimal equa¬ 
tions set up for observed values y, and weights Wi, 

(9) Ur(x) = — ? + —i <^i(a:) -f- <^ 2 (x) + ,••*, + — 0r(x). 

Qa qi q2 qr 

If E'^ denotes the weighted sum of the squares of the discrepancies between 
the ordinates v,r{x) of the fitted curve and the observed values y *, then [2], 

(10) E'^ = Wi[Mr(x) - Vx? = £ Wxvl - 2 — . 

i-o q. 

The practicability of the use of orthogonal polynomials is thus seen to depend 
upon whether the quantities 4(r, k) and Yr can be evaluated in a reasonably 
simple manner. 

The thesis of this paper is that if /r(x) is taken as the binomial coefficient 

one can effectively apply the method of orthogonal polynomials. This is made 
possible by the use of factorial moments in conjunction with an adding machine 
that prints cumulative totals. 

In treating the same problem Aitken sets up the normal equations in terms 
of factorials, but considers the explicit use of orthogonal polynomials imprac¬ 
tical. He writes: “the arbitrary nature of the weights stands in the way of 
any analytical sophistication; orthogonal polynomials emerge, but are not of 
great use; and the necessity of solving the moment equations cannot be circum¬ 
vented” [3]. He prefers a determinantal method of solution of the normal 
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equations which the writer has found to be more involved from a practical point 
of view, than the present method, although it is elegant from a theoretical 
standpoint. 

Thus although the present method is not new from the point of view of 
theory, the writer has found that forms made up by the use of the technique 
suggested below, offer an effective method for fitting polynomial curves to 
weighted observations 


2. Simplification of the problem when fr(x) 



Factorial moments S, 


and Mr are defined by 



These moments are not difficult to compute and are readily checked as com¬ 
puted. Formula for ^(r, k) then becomes 


( 12 ) 

Thus since <l»a{x) 


= 1 


$(r, Ic) = S 'Wr(l>k{x). 

, 4.(r, 0) = S Cl Wt = Sr and hence 


- (0 - iii— 


Si 

/So' 


Again 


$(r, 1) = S 




= (r + -t- rS, - ^ 


Hence 


= $(1, 1) =2St + (l -^Si. 

A recursion formula for $(r, k) may be obtained by expanding <l>k(x) in formula 
(12) by means of (0). Thus 


(13) 


$(r, k) 




^ $(r, i)^(Je, i) 
g. 



The first term can be easily expressed as a linear combination of binomial coeffi¬ 
cients, and thus as a linear combination of moments S,. 
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The formula for Yr can be broken down as follows: 

Fo = 53 “>*2/* = Moy 

Fr = 53 153 Wsyx<l>y{x)] 

•-0 g< 


Thus 


Fi = iWi-|'Fo, 
oo 

Yi = Mi- Fi - Fo, etc. 

gi qty 


3. General technique of computation. In determining the best fitting poly¬ 
nomial of degree r, the ratios $(r, t)/g» are seen to play an important part. 
In a form for calculation, these quantities should receive simple designations 
such as 5, for a second degree curve, c, for a third degree curve, etc. Suppose 
they are designated by R, for a curve of degree r; then 



(16) F, = Mr-EB.F< 

t-0 


(17) 2r = 53 ~ ^ 

and in determining ^'(r, k) for fc = 0, 1, 2, ■ • r — 1, formula (13) may be 
written. 

(18) Hr, fc) = 53 (r)(0’"* “ S 

The fact that these quantities J2< appear as multipliers in so many of the 
fundamental formulas greatly simplifies the mechanics of the calculation, espe¬ 
cially when a calculatmg machine is used. 

In final determination of polynomial curve the differences of the polynomial 
at X = 0 are readily determined since the leading term of each orthogonal 
polynomial is a binomial coefiBcient and thus 

r^l 

A*.^,(0) = - £ «.A‘«j(0), 

A^*,(0).= 1. 


(19) 


A) = 1,2, 3, • ■ •, r - 1 
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Since the effectiveness of the method depends upon the availability of ar 
adding machine which records a cumulative subtotal, the determination of tht 
curve from the differences at the point » = 0 is not a hardship and indeed 
affords a quick and accurate means of setting up the curve for purposes ol 
plotting and checking. 

Ur(0) = — + —<#> 1 ( 0 ) + —1^8(0) + + —<#'r(0), 

go gi gj g^ 

(20) A^w.(0) = —^ + — aVi+ 1 + ,••■, + —A^c^,,(0), 

g*! gii+i gr 

A'Wr(O) = 

Qr 

The advantage of the use of orthogonal polynomials becomes particularly 
apparent when error formulae are to be used The formula for the sum of,the 
squares of the discrepancies, denoted by E^, is given above (formula (10)). 
The estimated variance V of the weighted observations about the fitted curve 
is thus E^/{n - ? - 1) whcie n is the number of values of x used in fitting 
and r is the degree of the curve fitted. Recalling that the matrix of the normal 
equations is of the diagonal form with diagonal elements go, gi, • • ■ , gr it 
follows that the coefficient Fr/gr of <l>k(x) in the expansion of Ur(x) has the 
variance F/gj,. 

Furthermore the variance of the ordinate of the fitted curve Urix) at a point x 
due to sampling variations in the determination of the coefficients of the curve, 
under the assumption that the weights and values of the independent variable x 
do not involve errors, has the simple form 


Variance of 11,(0;) ^ ^ 0 , 

(21) atpoint a; = + 

L go gi 

since the covariances of the orthogonal polynomials are zero [4], 


gr J 
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COMBINATORIAL FORMULAS FOR THE rth STANDARD MOMENT 
OF THE SAMPLE SUM, OF THE SAMPLE MEAN, 

AND OF THE NORMAL CURVE 

By P. S. Dwyee 


The standard momenta of the normal curve are usually expressed by the two 
statements [1, p. 97] 


( 1 ) 


_ (2s)!] 
2* si 

“2«-t-X = 0 


pairs 


It is of some interest to note that these two statements may be generalized into 

a single statement by observing that is the number of ways in which 2s 

things can be grouped in pairs and that 0 is the number of ways in which 2s + 1 
things can be grouped in pairs. It is obvious that an odd number of things 
can not be grouped in pairs since there must be at least one unpaired unit, It 
IS clear, too, that the number of orders in which 2s things can be grouped in 

(2 2 OC' 2 0 ■ (2)© ^ 

resulting paired groups (rather than the orders of grouping) are counted it is 

seen that each paired grouping is repeated s' times so that repiesents the 

2's! 

number of ways 2s things can be grouped in pairs. If we arbitrarily define the 
number of ways 0 things can be grouped in pairs to be 1 (or if we limit our 
theorem to values of r > 0) we may say “The rth standard moment of the 
normal curve is equal to the number of ways in which r things can be grouped 
in pairs.” 

As presented above the combination representation is used primarily as a 
means of unification of results However, it is possible to derive the standard 

moments of the normal curve in such a way as to indicate the term early 

in the proof and to trace it throughout the proof. I follow the method outlined 
by H. C. Carver [2] in obtaining the normal distribution as the limit of the 
distribution of sample sums (or of sample means) though I use a somewhat 

different notation [3, p. 6]. If we let ^ represent the number of 

ways in which r units can be collected with n groups containing pi units, ira 
groups containing pa units, etc., then the multinomial theorem can be expressed 
as [3, p. 17] 


( 2 ) 
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where the aummatlon ia taken over all possible partitions pi ’ ■ •. pf* of r and 
the expression (pP ■ pl‘) represents the power product form [3, p 14] which 
is iTiIiTs! • • • ir>! times the monomial symmetric function. If p represents the 
number of parts of the partition then 


while 


p = TTl -{• TTl • + IT, 


r = pm + piTi + ■ ■ • + p.TT,. 

Now it can be shown from (2) in the case of infinite sampling that 



and since pi = 0, it is only necessary to sum over all partitions which have no 
unit part. We have then^ dividing by 



We have now a formula for the rth standard moment of the sample sum which 

is expressed essentially in combination notation since the quantity 

represents the number of ways in which r units can be grouped to form jti 
groups containing pi units, irj groups containing pi units, etc. All non-unitary 
groupings of r are formed, each combinatorial coefficient is computed and multi¬ 
plied by times the product of the corresponding a’s, and the sums are 

formed. It might be noted that the formula for the rth standard moment of 
the sample mean is identical with (4) while the corresponding finite sampling 
(without replacements) formula is 



(5) 





N'Ppri. 





The P’s are defined in previous papers [2, p. 105-6][3, p. 113], 

We obtain the formula for the rth standard moment of the normal curve by 
taking the limit of (4) as n —+ oo. (H. C. Carver has pointed -out [2, p 121] 
that this method of derivation imposes fewer restrictions than does the deriva¬ 
tion from Hagen’s hypothesis.) Each partition term will approach zero as n 
approaches infinity if p < ^r. Now the only non-unitary partition in which 
p is not less than is the partition 2*’’ and we can have this partition only when 
r is even. Now the limit as n approaches infinity of /r^ is unity and we 
have, in the limiting case 


( 6 ) 


ar = 



if r is even. 


[ 0 if r is odd. 
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Since 



is the number of ways r units can be grouped in pairs when r is 


even and since 0 is the number of ways r units can be grouped in pairs where 
r is odd, it follows that the rth standard moment of the normal curve is the 
number of ways in which r units can be grouped in pairs. 

This development is of interest in that it makes possible the tracing of the 


value l^ack through the various stages of the development to the coefficient 
of (2^’’) in the power product expansion of the multinomial theorem. 
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ON A METHOD OF SAMPLING' 


By E. G. Olds 


It is recorded that Diogenes fared forth with a lantern in his search for an 
honest man History docs not tell us how many dishonest men he encountered 
before he found the first honest one but, judging from the fact that he took his 
lantern, apparently he expected to have a long search. The general problem of 
sampling inspection, of which the above is a special case, can be stated as follows: 

Given a lot, of size m, containing s items of a specified kind If items are 
to be drawn without replacement until t of the s items have been drawn, how 
many drawings, on the average, will be necessary? 

Uspensky^ has solved a problem concerning balls in an urn, from which the 
answer to the above question can be obtained for the special case ^ = 1. For 
the general case, the distribution for the number n of the drawing in which the 
fth specified item appears, is given by terms of the series; 


( 1 ) 


f ^ O- 
Vo = 2^ 

n-1 


n—1*»- 


,-i(7„ 




^■= E' 


'n— 1 , 1 —1 


Gm~n,a- 




Presented to The Institute of Mathematical Statistics, Dec 27,1938, at Detroit, Mich , 
as part of a paper, entitled "Remarks on two methods of sampling inspection.” 

®J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, New York, 
1937, p. 178. 
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where the first symbol indicates the number of ways of choosing f 1 of the 
specified items to fill the first n — 1 places, the second symbol indicates the 
number of ways of disposing of s — 1 specified items in the last m n places, 
and the denominator gives the number of ways that the s items can be scattered 
through the lot. In order to get the average number of draws we multiply 
yo by n and sum. Then we have 

gw-»,.- i _ ijm + 1) Y ^ ^(m + 1) 

^ 71=0 Cm,I S + 1 71=0 Cm+l,s+l S + 1 

Example 1. On a table of 200 bargain shirts there are 5 which have a 15 in. 
neckband and 36 in. sleeves. How many shirts must be examined, on the 
average, to find two of the desired kind? 

Solution. For this case, m = 100, s = 5, t = 2 Therefore n = [2(201)] 4- 
6 = 67. Thus, an average of 67 shirts must be examined. 

Suppose liK represents the ifth moment about the mean, yjt the Jfth moment 
about the origin, and vk the moment relation given by 

(3) v'^ = (vi + K- !)'*>, 

where (yi -f - 1)^*^ represents the result of expanding (r + iC — 1)^*' and 
changing the exponent of v to the corresponding subscript. (For example, 
j/j = -|- 2)^” = Vs + 3^2 + 2vl .) It is easy to derive the recurrence relation 

I ii + K — l)(ffi + K) I 

( 4 ) vk =- _ ■ ^ • 


From this result the computation of the moments about the mean is theoretically 
direct. Actually the results do not seem to be very compact. The variance is 
given by 


(5) 


U2 = 


(m + l)(w — s) 
(s + l)2(s + 2) 


[t-(s + 1) ~ 1*]. 


In case s is unknown and n is known for a particular value of we may 

i{m + 1) 


estimate s, ( or rather - | ^), 
\ 8 + 1 / 


by using the relation, n = 


+ 1 


Then 


( 6 ) 


8 + 1 


est. = 


i{m + 1) ’ 


and the variance, using this estimate, is given by 


(7) 


Variance of 



est. 


n 1 Tn. ^ 

Fl - ” 1 

n + i(m + 1) i{m + 1) 

m + 1. 


Example 2. In order to check a box of 144 screws, screws are drawn until 
10 good screws are obtained. In a particular case only 10 drawings were neces¬ 
sary. Estimate the number of good screws in the lot. 

Solution. Here m = 144, i = 10, » = 10. The estimate for s is obtained 
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~ 10(145) ~ 145 flight be expected, the conclusion 

is that all the screws are good, Furthernaore the variance of the estimated 
quantity is zero. 

It is obvious that the number of draws necessary to obtain any particular 
number of specified items is correlated with the numbers of draws for lesser 
numbers of items. To investigate this, let us suppose that n, represents the 
number of draws to obtain exactly j specified items and that x, = rij — ?i,_i 
It follows immediately from our previous results, that 


(8) ■ E{xx) = E{x,) = E{x^) = ... = 

This result could be obtained from the fact that, corresponding to any arrange¬ 
ment of the lot for which Xa = a and Xb = h, there is another arrangement 
where Xa = b and Xb = a, formed by moving a — b of the non-specified items 
from the first group to the second. From this fact we see, also, that 

(9) E(xl) = E(xl) = E(x\) ^ . 

But xi = ni and ^ 1)^^+ 2) f® + ^ ~ 

Therefoie, 

(10) (111 = <r*j = • • • = ds. 

But, from our previous formula we have 

= d(2s — 2), (T^, = d(3s — 6), etc. 

Since 712 = Xi X 2 , it follows that 

O’nj = Vii + + 

where is the correlation between X\ and Xz. Therefore, 

(11) ~ ~l/s. 


Also, since Xi = nz — xj, it follows that 

( 12 ) 

Likewise, from Xz = riz — xi, we get 

(13) = 

Finally, we obtain the three general results 





i 

s(s — i -f 1) 


J 


(14) 
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(15) 

(16) 


s -t±l 
si ’ 


r = x/ 


+ 1 )' 


Example 3. The cards of a deck are turned one by one until two aces have 

appeared The second ace appears when the 36th card is turned. How many 

more cards should one expect to have to turn to find a third ace? 

Solution. Here m = 52, s = 4, t = 2, w* = 36. 

„ 53 - 53 , ^ / 2 Vfi ,, 

Then 2- ^ g , and r,i 2 .®a ^ _ 2 i) 

(Tig = Vid and (r„j = s/M. Since — - — = we have 

0**8 O" m 



2 -y/e 
•v/e’ 6 



3 ■ 


Of course this result could have been obtained more directly by noting that 
there were two aces left among the 16 remaining cards. 


Conclusion. The results given in this note might be useful when it is neces¬ 
sary to estimate the number of items to be drawn in order to secure a desired 
number of a particular type, such as may be the case in obtaining a sample 
with previously defined characteristics. Also the note disproves such intuitive 
notions as the one that when looking for a desired record, one is most likely to 
have to search the whole pile to find it. As far as methods of sampling inspec¬ 
tion are concerned, the one implied in this note has little to recommend it. 

Caknbgie Institute of Tbohnology, 

PlTTSBUBGH, Pa 


RANK CORRELATION WHEN THERE ARE EQUAL VARIATES^ 

By Max A. Woodbury 
If there is given a set of number pairs 

(1) Cfi, Fi), (Jj, Y,), . . , (Zn , Y^), 

we may assign to each variate its “rank” (i.e. one more than the number of 
corresponding variates in the set greater than the given variate) In this way 
there is obtained a set of pairs of ranks 

(2) (xi, yi), , yz), • ,{^N,yn)‘ 

‘Presented at the fall meeting, Mich, section of the Math. Assn of America, Nov. 18, 
1939, Kalamazoo College 
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If we assume that ^ X, and F, 5 ^ Y, when i ^ j then it follows that 
each integer from 1 to iV appears once and only once in the a;'s and the same 
holds for the if a This leads at once to the formulas: 

1/ N N 

(3a) J^x^ = J2y^ = '^^ = NiN + l)/2, 

t—1 I"! 1=1 

(3b) E = E 2/? = E = NiN + l)i2N + l)/6. 

i=»l i«=l i“*l 

When these results are substituted in the expression for the product moment 
coirelatioii coefficient we have after simplifying [1], 

N 

(4) p = 1 — 6 E Dl/NiN‘‘ — 1) where D, = x, — y,. 

If we consider the case of equal variates and follow the rule for assigning 
ranks given in the first paragraph, the resulting method is known as the bracket- 
rank method. The use of (4) in the calculation of p by this method is not 
strictly valid, because not every integer appears in the summations and so 
neither (3a) nor (3b) is true. 

The more accurate mid-iank method assigns to each of the equal variates 
the average of the ranks that would be assigned if we were to give them an 
arbitrary order. This method preserves (3a) but not (3b). In this paper px 
indicates the value of p as calculated by (4) when the mid-rank method is used. 

In a method due to DuBois [2], the equal variates arc assigned the same rank 
so as to satisfy (3b). In this case (3a) is not satisfied. 

If we assign the ranks to the equal variates in an arbitrary way, then (3a) 
and (3b) are of course satisfied and the use of (4) is valid. There are two 
disadvantages to such a method; first, the equal variates are treated differently, 
and second, the assignment of ranks is arbitrary. These difficulties are removed 
if one uses the average of the values of p corresponding to all possible ways of 
arbitranly assigning ranks to the equal variates Since p is linear in E the 

_ I 

average value of p may be obtained from the average value of E -O* the use 

I 

of (4). 

Let us first consider the simple case of two equal variates in one of the vari¬ 
ables, say X. It is clear that there are only two possible ways of assigning 
ranks, and that if we arrange the series by the assigned x ranks, the resulting 
series differ only in the y ranks corresponding to the equal X variates. If we 
denote the two x ranks to be assigned by m and m 1 and the y’s corresponding 
for a particular arrangement by ym and y^+i we have for the average E the 

t 

expression 

E (® - V^y + E (a: - VrY 

+ “ yrnf + (wi -f 1 — Vm+lY + (wi — y»+l)’ -h (wi -b 1 — 


(5a) 
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By the mid-rank method the corresponding expression is 


(5b) £ (a; - Vzf + X) (a; - Vxf + (m -}- i + (m d- ^ 

The coriection A 2 to be added to the mid-rank X -D? to get the average X -0? is 

1 t 

by subtracting (5b) from (5a) and simplifying, 

(6) Aj = I 


To got Ak in the more general case of several equal variates, we need only con¬ 
sider the difference between the average value of X -D* ^^ad that obtained by the 

l 

mid-rank method. If there are K equal X variates we may assign the ranks 
in if' ways, this results in permutations of the y ranks for the sets arranged 
in order of their assigned x ranks. In (Z — 1)1 permutations ?/„+, corresponds 

ff 

to the X rank oi m + i so that the correction to the mid-rank X is 

T —1 


(7) 


4 ,- [S £ (« + 

Jil t«0 


Vm+,?] - X (m d- 


i X-1 X-1 
JX jaO taaQ 


im + t- y„,+,y - (in -f- ~ 2/’'*+^) 


_ K(K^ - 1) 

12 


11 is to be noticed that the correction is positive and depends only on the number 
of equal X variates. From this it can be concluded that foi more than one 
^up of equal vaiiates no matter whether X’s or F's we can obtain the average 
X by computing a correction for each group and then adding these correc- 

t 

(ions to get the total correction to the mid-rank ^ Dl Then as before noted 

X 

wc can by (4) calculate the average p (denoted as p). 

This correction to X -0? may be converted into a correction to pu . That is 


if 

( 8 ) 


6Aic, 

N{m - 1) 


K.iKl - 1 ) 
2N{N^ - 1) 


then 



If,K, , 


where the summation extends over all groups of equal variates, and Ki is the 
number of equal variates in the fth group. 

A table of S^k for different values of N and K is given, and also a tabic of 
Ajc. The values Ak are given in the top row of the table, while the Snu: are 
given in the rows below 
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Table of Ajc and Sjv-x 


''X 

2 

3 

4 

6 

6 

7 

8 

9 

10 

11 

12 

13 

Af 

0 5000 

2 000 

6 

10 

17.5 

28 

42 

60 

82 5 

no 

143 

182 

Satk 













3 

1250 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 


i 

0600 

2000 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

5 

0250 

1000 

2600 

— 

— 

— 

— 

— 

— 

— 

— 

— 

6 

0143 

0571 

1429 

2857 

_ 

_ 

_ 

_ 

_ 

_ 

_ 


7 

0089 

0357 

0893 

1786 

3125 

— 

— 

— 

—* 

— 

— 

— 

8 

0060 

0238 

0596 

1190 

2083 

3333 

— 

— 

— 

— 

-- 

— 

9 

0042 

0166 

0417 

0833 

1458 

2333 

3500 

— 

— 

— 

— 


10 

0030 

0121 

0303 

0606 

1061 

1697 

2646 

3636 

— 

— 

— 

— 

11 

0023 

0091 

0227 

0466 

0795 

1273 

1909 

2727 

3750 

— 

_ 

_ 

12 

0017 

0070 

0176 

0350 

0612 

0979 

1469 

2098 

2886 

3846 

— 

— 

13 

0014 

0055 

0137 

0276 

0480 

0769 

1154 

1648 

2266 

3022 

3929 

— 

14 

0011 

0044 

0110 

0220 

0385 

0616 

0923 

1319 

1813 

2418 

3143 

4000 

15 

0009 

0036 

0089 

0179 

0313 

0500 

0760 

1071 

1473 

1964 

2654 

3250 

16 

0007 

0029 

0074 

0147 

0257 

0412 

0618 

0882 

1213 

1618 

2103 

2676 

17 

0006 

0025 

0061 

0123 

0214 

0343 

0516 

0736 

1011 

1348 

1762 

2230 

18 

0006 

0021 

0062 

0103 

0181 

0289 

0433 

0619 

0861 

1135 

1476 

1878 

19 

0004 

0018 

0044 

0088 

0164 

0246 

0368 

0526 

0724 

0965 

1254 

1596 

20 

0004 

0016 

0038 

0075 

0132 

0211 

0316 

0461 

0620 

0827 

1076 

1368 

21 

0003 

0013 

0032 

0065 

0114 

0182 

0273 

0390 

0536 

0714 

0929 

1182 

P2 

0003 

0011 

0028 

0056 

0099 

0168 

0237 

0339 

0466 

0621 

0807 

1028 

23 

0002 

0010 

0025 

0049 

0086 

0138 

0208 

0296 

0408 

0543 

0708 

0899 

24 

0002 

0009 

0022 

0043 

0076 

0122 

0183 

0261 

0359 

0478 

0622 

0791 

26 

0002 

0008 

0019 

0038 

0067 

0108 

0162 

0231 

0317 

0423 

0560 

0700 

26 

0002 

0007 

0017 

0034 

0060 

0096 

0144 

0205 

0282 

0376 

0489 

0622 

27 

0002 

0006 

0015 

0031 

0053 

0085 

0128 

0183 

0262 

0336 

0437 

0566 

28 

0001 

0005 

0014 

0027 

0048 

0077 

0115 

0164 

0226 

0301 

0391 

0498 

29 

0001 

0006 

0012 

0026 

0043 

0069 

0103 

0148 

0203 

0271 

0352 

0448 

30 

0001 

0004 

0011 

0022 

0039 

0062 

0093 

0133 

0184 

0245 

0318 

0405 

35 

0001 

0003 

0007 

0014 

0026 

0039 

0059 

0084 

0116 

0164 

0200 

0256 

40 

0000 

0002 

0005 

0009 

0016 

0026 

0039 

0066 

0077 

0103 

0134 

0171 

45 

0000 

0001 

0003 

0007 

0012 

0018 

0028 

0040 

0054 

0072 

0094 

0120 

50 

0000 

0001 

0002 

0004 

0007 

0011 

0016 

0023 

0032 

0043 

0055 

0070 

60 

0000 

0001 

0001 

0003 

0006 

0008 

0012 

0017 

0023 

0031 

0040 

0051 

70 

0000 

0000 

0001 

0002 

0003 

0006 

0007 

0010 

0014 

0019 

0025 

0032 

80 

0000 

0000 

0001 

0001 

0002 

0003 

0006 

0007 

0010 

0013 

0017 

0021 

90 

0000 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0009 

0012 

0015 

100 

0000 

0000 

0000 

0000 

0001 

0002 

0003 

0004 

0005 

0007 

0009 

0011 
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As an 

example of the 

use of the 

table we will consider the following problem 

!, p. 56^ 

1, with the ranks assigned 

as for the mid-rank 

t 

method 

Subject 

I 

II 

For the mid- 

■rank method we have 

A 

1 

2 5 

— 14 

tlDl 

= 119.5, N = 14, 

B 

4 

10 



C 

D 

4 

4 

2.5 

5 

Pm = 1 - 

6(119.5) 

14(196 - 1) 

E 

4 

7 

Referring to 

1 the table we find that 

F 

4 

2.5 



G 

7 

8 

K, 

-~ 

H 

8 

2.5 

Ax, 5 jvk, 

I 

9.5 

6 

2 

0.5 0.0011 

J 

9.5 

12 

3 

2.0 0.0044 

K 

11 

11 

4 

5.0 0,0110 

L 

13 

13 

5 

10 0 0.0220 

M 

13 

9 



N 

13 

14 

Total 

17.5 0.0385 


We know that p = 1 - = 0.6989 and in terms of Snk, 

14(196 — 1) 

p = 0 737'4 - 0 0385 = 0.6989 
The value given by DuBois for his method is 0.7511. 


Conclusion. A method has been developed for the treatment of rank correla¬ 
tion wheie there are groups of equal variates. The method consists of applying 
a generally small correction to the value as ordinarily calculated by the mid¬ 
rank method in order to find the value which would be obtained by averaging 
the values of the rank correlation coefficient for all possible ways of arbitrarily 
assigning ranks to the equal variates. Thanks are due Professor P. S. Dwyer, 
without whose aid and encouragement this paper would not have been written. 
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NOTE ON THEORETICAL AND OBSERVED DISTRIBUTIONS OF 
REPETITIVE OCCURRENCES 

By P. S. Olmstead 

1. A simple problem of repetitive occurrences. Two questions which the 
engineer often desires to answer whenever he has a new type of apparatus or a 
new design of an old type of apparatus are: How many times will it perform 
its intended function without failure? and How many times will it fail to perform 
its intended function in a given length of time? To do this, he selects a number 
of what he believes to be identical units of the apparatus and gives.each unit a 
performance test under a uniform test procedure. The number of satisfactory 
operations prior to the first observed failure to perform this operation is called 
a “run” and is a measure of the type desired for each unit. 

If it is assumed that the probability of failure at any operation is a constant, q, 
and the probabihty of satisfactory operation is 1 — g or p, then the mathe¬ 
matical probability of runs of 0, 1, 2, 3 • • satisfactory operations for any 
unit are 


( 1 ) 


Q, PQt P\ P% 


respectively 

Let X denote the number of satisfactory operations in any-run. 
value of X , say m,, is given by 


The mean 


( 2 ) 


Tfl/x 


The variance of * is 


(3) 


P 


qi- 


The first step in practice is to determine whether there exists a constant 
probability, p, by means of the application of the operation of statistical con¬ 
trol.^ Expressions (1), (2), and (3) provide the necessary information for domg 
this. When a constant probability exists as evidenced by at least 25 consecu¬ 
tive samples of 4 units each the following practical procedure has been found 
to be satisfactory. 

1. An estimate of p (or g), the sole parameter of the distribution, can be 
obtained from the average length of run in the sample. If p is less than 0.6 
and if the sample size is large, a reasonably good estimate of p can be obtained 
from the proportion of the sample having runs of zero length. 

2. The probability of getting runs of length x or more is p*. Thus, if a 
minimum (or maximum) value of the probability, p*, is chosen, a maximum 


1W. A. Shewhart, *‘Slah8tical Method from the Viewpoint of Quality Control," The De¬ 
partment of Agriculture Graduate School, Washington, 1939, Chapter I. 
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(or minimum) expected length of run can be computed for use as a criterion 
for looking for assignable causes of vanation in the length of individual runs 
by using the estimated value of p 

3. The average and standard deviation to be used m calculating the limits 
to be applied to successive samples of rational sub-groups in accordance with 
the Shewhart^ Criterion I are given by Equations (2) and (3) in which the 
estimates of p and q are substituted 

2. Application to a signal transmission problem. The theoretical solution 
given above is a direct answer to the first question at the head of this note. 

TABLE I 


Observed distributions of runs of x occurrences of event E for various test periods of 



The second question is also of interest particularly when failure to perform an 
operation does not impair the apparatus unit for performance of additional 
operations. In cases of this type, the engineer often lets his teat eontinue for 
test periods of particular lengths, measured in numbers of operations or some¬ 
times in intervals of time (i.e., time intervals are often considered to be propor¬ 
tional to numbers of operations) and observes the number of failures during the 
test period for each unit. Thus, he may, after he has assured himself that 
control exists, arrange his data for each test period to show the frequency of 
occurrence of 0, 1, 2, 3, • • • failures per unit. 

Data of this type which are typical of those found in other studies made 


’Loo. oit. 
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during the past two years are presented in Table I These were obtained in a 
signal transmission study in which the data for successive periods were obtained 

TABLE II 


Comparison of observed and theoretical values of averages and variances for 

distributions of Table I 


Statistio or 


Test Period 

Parameter 


1 

2 

1 

3 

1 i 

& 

6 

7 

B 

11 

16 

no 

observed 

916 

1 

853 

786 

719 

679 

646 

632 

,617 

.532 

.491 

s 

observed 

,098 

.171 

269 

381 

.448 

543 

537 

,633 

917 

1 026 

p 

Ttlis - ~ 

theoretical* 

091 

172 

.272 

.390 

471 

548 

.583 

620 

.881 

1,039 


observed 

091 

.200 

.343 

497 

556 

832 

760 

1.075 

1 783 

1 921 

* 

theoretical* 

098 

202 

346 

542 

693 

848 

924 

1 005^ 

1,658 

2 117 


* Based on assumption that 5 is the true value of q 


TABLE III 

Theoretical disinbulions corresponding to distributions of Table I calculated by 
using q = — as the true value of q 


No o£ 
Occurrences 
per Period 

Freq. 

Test Period 

1 

2 

3 

4 

5 

6 

7 

8 

11 

15 

X 

0 

no* 

878 0 

1519 0 

961 0 

723.0 

541 0 

407.0 

343.0 

266 0 


77 0 

1 

ni 

73 3 

233,5 

205 3 

202,8 

173,3 

144.1 

126 4 

101 9 

74 9 

39.2 

2 

Tii 

6 1 

32 9 

43 8 

56 9 

55 5 

61 0 

46,6 

39 0 

36,1 

20 0 

3 

na 

5 

4 8 

9 4 

16 0 

17.8 

18 0 

17.1 

14,9 

16,5 

10.2 

4 

n* 

.1 

7 


4 5 

5 7 

6 4 

6 3 

5 7 

7 7 

5 2 

5 

ns 


1 

.4 

1 3 

1 8 

2 3 

2 3 

2 2 

3 6 

2 6 

6 

ne 



1 

.4 

6 

8 

.9 

8 

1 7 

1 4 

7 

n? 




.1 

2 

.3 

.3 

3 

.8 

7 

8 

ng 





1 

1 

1 

.1 

4 

3 

9 or ovei 

n»_« 








1 

,3 

,4 

Sample 

Size 

n* 

958 

1781 

1222 

1005 

796 

630 

643 

431 

301 

157 


* The observed values of no and n form the basis for the calculated distributions 


for separate units. Since each set of these data passed the scrutiny for control, 
there is justification for assuming that a statistical universe exists and that its 
functional form may be derived from the observed distribution. It was found 
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ilial (.UcHG daU were conswieiit, wiUi the assjumptiou that, where the probability 
of non-occurrence of a failure on a unit in the test period was g, the probability 
of exactly x failures on a unit was p\ This set of mathematical probabilities 
is shown m (1) with q redefined to apply m this case to non-occurrence of a 
failure 

Observed and “Theoretical" values of the averages and variances for the 
observed distributions are shown in Table II The. basis for calculating the 
theoretical values was to take the ratio (designated q) of r!« to n for each distri¬ 
bution a.s the estimate of the true value, q. Distributions as shown in Table III 

TABLE IV 

Test of fit of theoretical to obserned di&iribukons {Table III and Table I, re&pecUmly) 


Test Period 



1 

2 


D 

6 

6 

1 

7 

1 

' 8 

1 

11 

15 

X'* 

2.24 

0.20 

0.32 


9.79 





3.98 

Degrees of 
Freedom 

i 

1 

2 

2 

3 

3 

3 

3 

3 

4 

4 


.13 

.90 

.87 

.55 


.87 

.36 


■ 

.41 


* Minimum number in cell for theoretical distribution taken as 6. 


were calculated from each q. These distributions were tested against the oh 
served distributions by means of the test with the results shown in Table IV, 
which are all within reasonable limits of what might be expected when a con¬ 
stant probability exists, 

3. Conclusions. When a constant probability applies to each operation in a 
repetitive process this note she -6w to establish criteria for identifying signifi¬ 
cantly long or short lengths for individual runs and significantly high or low 
average lengths for groups of several runs. A problem taken from the field of 
signal transmission gives assurance of the existence of this type of distribution 
in practice. 

Belt. Telbphonb Laboratokibs, 

New Yohk, N. Y 















ne DiRtribiition Thooiy of Ruhb. A. M. Mood 
AIII ••pr'iIszBfion of (he T.awof iaigp NiimbtJfi. Flu ly 
<’o«Jil,iorH lor Lfiiqueiu-MN in i/ip PruWtsm of Momi 

IvKMiDATib, . . . . . , 

fin S iDij Ip , from . X. ,rn ml Bivm ialp Populab'on. l 

fin • Tf.is» SqH-irps Adjuatmciit of a Sarapicri ItVci 
when 1.1,0 })*ppctert Marginal are TC,«,wxi 
Jjl'IJTifi AM> !■ K, Wtfph IK *^ 


TJ,fl St»w3rr.l Krn>n,«r fin GwmMrio Mwins 
Itiffix Nuinbora. Nila-s Nobms . . 

A Nutp on thelfw of a Ppiirson Typr HI J>ur ’ 
A-W. Bkovn. . 

X’lali^tes of Parameter^ bj .THi-anB u> 
.It'asitUK, Jtt .. / «^ns oi 

Tnat-hiHg of Slatiatirt! 

Aildrriss flAdUM, Hoinxj.'Mc. 
iJisBUBiuon W. llnwiaus Dbmisu 
tii.n loiiiiii- 'III nt .S , 





THE DISTRIBUTION THEORY OP RUNS 
By a. M. Mood 

1, Introduction. In stud3Tng a particular sample, the order in ■which the 
elements of the sample were drawn is frequently available to the statistician. 
This important information is usually entirely neglected by him. Such dis¬ 
regard must be attributed, to a considerable extent, to the unsatisfactory state 
of mathematical de'vices for using the knowledge in question. One reasonable 
mathematical method for handling this information, the one to be used in this 
paper, is to make use of the distribution of runs. A run is defined as a succession 
of similar events preceeded and succeeded by different events, the number of 
elements in a run will be referred to as its length. 

The distnbution theory of runs has had a stormy career. The theory seems 
to have been started toward the end of the nineteenth century rather than in the 
days of Laplace when there was so much interest in games of chance. In 1897 
Karl Pearson [1], in a discussion of data taken from the roulette tables at Monte 
Carlo, wrote "... the theory of runs is a very simple one." In this book he 
developed no theory but it is evident from his computations that he regarded the 
distribution of runs as a special case of the multinomial distribution. The 
multinomial method, besides evading the issue somewhat and raising questions 
of random sampling, also gives incorrect results when one is interested in runs 
of more than one kind of element. In 1899 Karl Marbe [2] derived an expression 
for the mean of the number of iterations of a given length from a binomial 
population. This result was incorrect because he neglected dependence between 
overlapping iterations. An iteration is defined as a sequence of similar events; a 
run of length t is counted as t — s -|- 1 iterations of length s for s < t. Marbe 
has assembled a great mass of data with the object of prolong the popular 
hypothesis that a "head” becomes highly probable after a long succession of 
“tails" has appeared. Ordinary significance tests applied to his data do not 
support this contention, but Marbe continues to advocate it [3] and [5]. Of 
course, he has been severely criticised by many mathematical statisticians. 

In 1904 Griinbaum [6] derived the mean of the number of runs of given length 
from a binomial population by the multinomial method. The first correct 
formulae were derived in 1906 by Bruns [7] who found the mean and variance of 
the number of iterations of given length in samples from a binomial population. 
In a book published in 1917 von Bortkiewicz correctly derived for the first time 
the mean and variance of runs from a binomial population using a method similar 
to that of Bruns. This book [8] contains a great many formulae for means and 
variances of runs and iterations under various special circumstances; a large 
portion of it is devoted to an exhaustive criticism of Marbe’s work. In 1921 von 
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Mises [9] showed that the number of long runs of given length was approximately 
distributed according to the Poisson law for large samples. 

It was not .until 1925 (so far as the author has been able to ascertain) that an 
actual distribution function appeared when Ising [10] gave the number of ways of 
obtaining a given total number of runs (without regard to length) from arrange¬ 
ments of two kinds of elements. Stevens [12] in 1939 published the same dis¬ 
tribution and described a x” criterion for significance. Wald and Wohowitz [13] 
in 1940 published the same distribution and showed that it was asymptotically 
normal. These papers are all concerned with random arrangements of a fixed 
number of elements of each of two kinds; the last mentioned paper describes a 
very interesting application of the distribution to the problem of testing the 
hypothesis that two samples have come from the same continuous distribution. 
Wishart and Hirshfeld [11] in 1936 derived the distribution of the total number of 
runs (again without regard to length) in samples from a binomial population and 
showed it was asymptotically normal. 

In this paper we shall derive distributions of runs of given length both from 
random arrangements of fixed numbers of elements of two or more kinds, and 
from bmomial and multinomial populations. Also we shall give the limiting 
form of these distributions as the sample size increases. These limiting dis¬ 
tributions are all normal. The distribution problem is, of course, a combina¬ 
torial one, and the whole development depends on some identities in combinatory 
analysis,—some new and some well known to students of partition theory. 

The paper will be divided into two parts. The first will deal with distribu¬ 
tions obtained from random arrangements of a fixed number of each kind of 
element. The second will deal with distributions of elements from a binomial 
or multinomial population. 


Pabt I 

2. Distribution of runs of two kinds of elements. Consider random arrange¬ 
ments of n elements of two kinds, for example wi a’s and % h's with nx = n. 
Let ri, denote the number of runs of a’s of length and let Tu denote the number 
of runs of b’s of length i. For example the arrangement 

abhahaaaihaaa 

win be characterized by the numbers ru = 2, ns = 2, r 2 i = 1, = 2, and all 

other r<y = 0. Also we let ri = 21 1'u and r* = 22 fn denote the total number of 

{ < 

runs of a’s and b’s respectively. Throughout the paper a binomial coefficient 
will be denoted by 

(211 

\fc/ fcl(m —fc)I 

and this is defined to be zero when m < k. A multinomial coefficient will often 
be denoted by 
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ml 


(2 2) hl=- _ 

(2.3) Sm, = m, m, > 0 

and when such a coefficient is to be summed ovfcr the indices m, the two condi¬ 
tions (2.3) are always understood and will not be repeated; other conditions on 
the indices will be placed below the summation sign. 

Given a set of numbers r,, (t = 1, 2;;; = 1, 2, •.., n.) such that 2 i 

there are and different arrangements of the runs of o’s and b’a respec- 
tively. Hence the total number of ways of obtaining the set r„ is 

(2.4) 

where F(ri, ri) is the number of ways of arranging ri objects of one kind and rt 
objects of another so that no two adjacent objects are of the same kind. Thus 

Fin , rj) = 0 if 1 ri - rs 1 > 1, 

(2.5) =1 if 1 n ~ rz 1 = 1, 

= 2 if n = n 

Since there are possible arrangements of the o’s and b’s, we have at once the 
distribution of the r*,- 


(2.6) 


Pin,) = 


"ri'l ~n' 


Fin, n) 


(:) 


Certain marginal distributions will also be of interest. To obtain, for example, 
the distribution of the n, , it is first necessary to sum over all partitions of 
nt. This is easily accomplished by finding the coefficient of aj"” in 

(x -I- a:* -f- a:’ + • • ■)" = + x + ■ ■)" = ^ 

1 = 1 ) \ rz - 1 / 

The term corresponding to t = m — n gives the desired result: 
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We have then 

( 2 . 8 ) 


P(ru, ~ 


0 

and summing this over Ti, a slight simplification gives 

_ 0 


(2.9) 


P(n,) = 


(:) ■ 

The distribution (2.6) summed over n, and rj,- gives by means of (2.7) 


( 2 . 10 ) 


P{n,ri) = 


(:) 


which is essentially the distribution derived by Wald and Wolfowitz [13], and 
summing this over rj we get the distribution discussed by Stevens [12] 


( 2 . 11 ) 


Pin) = 




(:) 


Another marginal distribution which will be useful is obtained by summing 
(2,9) over ru for i > k. If we let 

sy = ry, j < k, 

ni ^1 

^1* J A. = jVjj' j 

k 1 

we must then sum the multinomial coefficient 

Slfcl 

ri*! •. • rinil 

over all partitions of % — A such that every part is greater than — 1, This 
is given by the coefficient of in 

+ ^^+1 + . . .)M* = 2 AU - 1 + 

' (-0 \ Su — 1 / 

thus we have 

(2.121 y;<M_ ^/ni-A-ik- Dsy, - l\ 

rit! ■ ■ • Tini \ si* — 1 / 
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where denotes summation over all positive integers ri* , n »+i, •.. , n. 
such that ^ ^ ni — A. This identity with (2.9) gives 


(2.13) P(su) 


Si /na + l\ /m — A — (fc — l)gi;t — i\ 
^ LsiJV Si J\ _SiA - 1 ) 


i — 1,2, • • •, 1). 


Another useful distribution analogous to (2.13) is derived by considering runs 
of both kinds of elements. If we define Sj, (j = 1, 2, • ■ ■ , h) and B in terms of 
Ti, just as Si, and A were defined above, it follows at once from (2.6) and (2.12) 
that 

(2.14) P(si,, Sj,) = 

'til - A - (k - l)sii - - B ~ Qi- l)s2k - 1 

^_ SU ~ 1 _ / \ _ 8^—1 _ 

(:) 

i = 1, 2, == 1, 2, ... 

These last two distributions should be the most useful for applications. The 
long runs have been added together to form the new variables su and s^^ thus 
decreasing materially the number of variables as compared with (2.6) and (2.9) 
while at the same time little information is lost. One is free to choose k and h 
so that the number of variables is appropriate for the data at hand. Moreover, 
it is shown in Section 5 that these variables are asymptotically normally distrib¬ 
uted so that one may apply a simple x** test of significance for “randomness of 
elements with respect to order” when dealing with large samples. We shall 
then be able to test whether a sample has been “randomly” drawn in a certain 
sense. 




3. Moments for runs of two kinds of elements. Instead of dealing with the 
ordinary moments we shall obtam formulae for the factorial moments because 
the expressions are much more compact. As is customary, a factorial will be 
denoted by 

(3.1) a:'"’ = x(z -!)(*- 2) ... (x - a -f 1), 

and X™ is defined to be 1. Of course the ordinary moments are determined by 
the factorial moments by means of relations of the type 

x“ = E cW'l 

t-0 

A recent discussion of the coeflicients Ct has been given by Joseph [14]. The 
mathematical expectation of a function /(r) will be denoted by 
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(3.2) 


mir)) = 2:/(r)P(r). 




Of course P is a linear operator. We shall require the following identity 

V ri - 2 a, - 1 / 

where S(i) denotes summation over all positive integers ru, ru , ... , ri„j such 
that ^ iru = ni • (3.3) may be verified by differentiating 


(3.3) 


viU) — (liX + til? + • • • )'^' 

a, times with respect to t. (i = 1, 2, ■. • , ni), then finding the coefficient of 
after putting = 1. The identity (3.3) enables us to find the factorial 
moments of the variables in the distribution (2.9) for we have 


^(]?*') = znrii-[;;](-+*)/(;) 


ni — ^ iat — l\/ni+ 1 ^ 
ri 


f n y 
Kni) 


(3.4) 


ri - X) a. - 1 / 

'n - Y!i (t + 



n' 

nj 


= (n, + 1)'"“*’ 


ni - X) lOi 



n 

Til/ 


The sum on ri involved in the last step is given by the identity 

<-> S(ct.)(f) = (n^) 

which is readily obtained by equating coefficients of x° in 

\ Xf 

We shall give here the means, variances and covariances obtained from (3.4) 

(3.6) E(ri,) = in, + 

(3.7) 


„ _nf( 7 ij +l)® 7i(’'+^ nlin,+ lYn['\i’^ 

” n(»+I+2) ^<»+l)^(;+l) ' 


(3.8) 


an = 




n'2*+» 


, («s + l)®ni*' /, _ (ns + l)'*'ni'^\ 

V n«+‘> /■ 
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These will be needed in the section dealing with asymptotic distributions. The 
moments for the distribution (2.6) follow at once from (3.3) as 


(3.9) 


E (11 = E 

tl ri.rj 

hi - E “ 1\ 


hi - J^jbj - 1 \ 

\ - E ^ -1 / 


F{ri, Ti) 



The summation on rj is accomplished by putting = ri - 1, n, and n + 1, 
but after that has been done it is necessary to expand the product of the two 
factorial factors in factorial powers of the lower index of one of the binomial 
coefficients. This is easily done for the first few moments, but there appears 
to be no simple expression for the general case. The means, variances and 
covariances of rj, are given by (3.6), (3.7) and (3.8) and those of r 2 , are obtained 
from these equations by interchanging ni and nj. The other covariances are 


(3.10) 


rii Tli , . 

4- 4 


yl(t+3+2) 


Ij(i434-U 


+ 2 




(ni + l)‘”(n2 + 


A slight variation of the method above will give the moments of the si, in 
the distribution (2.13). An accent on a s umm ation sign will indicate that the 
term corresponding to i = k is to be omitted. Differentiating 

ip(i,) = [tiX + + • • • + f*-i®*~* + <*(®*’ + + . •. )]*‘ 


Oj times with respect to t, and finding the coefficient of a:"* after putting = 1, 
we obtain 


(3.11) 


IT <“•) r— A ““ (fc l)si* iN 


= 


hi — ^ to, -i- ai; — l) 
V Si - a, - 1 ) 


This with (2.13) gives by the same steps as used in obtaining (3.4) 


(3.12) E sn*’) = hi + 1)‘^“‘^ 


n — E ^ 

ni—Y^ioi 





ni 


The first two moments are 
(3.13) Ei.si^ 


(nj + l)ni*^ 
n»> ’ 


(3.14) 


nlim + riiini + l)^nl‘> nf > 
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_ {ni + (m + l)wi*^ / ^ (m + 1)71 


/l _ (^2 + l)7li*'\ 

\ n(*) )■ 


fn 1 K'l '.'“S I -i/ "-1 I 

(.3.16J (TtJ, = -f-TT-- + -^ _ _ 

The others are, of course, given by (3.6), (3.7) and (3.8). 

The joint moments of the variables in (2.14) as obtained from (3.11) are 

E (II siV' = L h “ ^ 


(3.16) 


*'■“ \ Si - - 1 

/na - + h - 1 

In addition to the covariances (3.10) we shall need 


^(si, S2) 


\niy 


(3.17) 


TjCk+J+l) ' 7j,(fc+j> ■ 


(m + l)^^^(n2 + 1)^°’^!*^ ni'> 


TjC*) TiO+l) > 

18^ . = ^ ^ _ (,^1 + l)(n2 + l)nW<> 

*'**’* 71(*-H) 71<*+A-0 


The moments of r in the distribution (2.11) may be derived easily by means 
of (3.6) as 

(3.19) Eirl"^) = (m + 1)‘“’ I “)/(”) ■ 

From which 


(3.20) 

(3.21) 


E(rd = 

n 

_2 _ (na + l)‘®Tir> 


4. Distribution and moments of runs of k kinds of elements. This section 
is a generalization of the preceeding two sections to several kinds of elements. 
The case k = 2 was treated separately because the special character of the 
function F(ri , ra) in this instance made the distribution comparatively simple. 
Now we shall be interested in k kinds of elements denoted by fli, • • ■ , a*, and 
we shall suppose there are n, elements of the ith kind. We let n, denote the 
number of runs of elements of the tth kind of length j, and put 

• * ni 

n = 22 ni, r,- = 2] n,. 

1 

The same argument as was used in deriving (2.6) gives 
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■where the function F(ri , rj, ■ , rt), which will be referred to hereafter simply 

as F(r,), represents the number of different arrangements of n objects of one, 
kind, ri objects of a second kind, and so forth, such that no two adjacent objects 
are of the same kmd. We shall be able to give the explicit expression for 
after examining the marginal distribution P(n). This is obtained by summing 
(4.1) over r, with rj, fixed by means of the identity (2 7) givmg 


(4.2) 


Pir,) = 


n fciQ 

[:] 


F{r.) 


Despite our present meager knowledge of F(r,) it is possible to find the 
momenta of the u as distributed by (4.2) Since P(r,) = 1, we have the 

ft 

identity 

( 4 . 3 ) ■ 

From this the moments are easily derived. If we put 


(4.4) 
we have 


u, = n, — r, 


z; n n (”; z J) "(.j - r n («. - n z 1 ) w 

')«>■.) 




it 

n 


The summation invol 
last equation by 


["1 


,ved in the last step is given by (4.3). On diidding the 
we get the factorial moments of the w, 


(4.6) ®(n «:■•') = n («, - d-' [VA"]/ £]• 

From these equations the moments of the r, may be found; the means, variances 
and covariances are 
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(4.6) 

(4.7) 

(4.8) 

It is clear that 


Eir.) = 


n,(n — n, + 1) 


n 


_ nf nf> 


nf^in — Tii + 1 ) 


(Tii = 


( 2 ) 


nn' 


( 2 ) 


(4.9) 


= Coefficient of IT x?‘ in 

1 


(a:i + .. ■ + Xk)’’ n (a!i + • • • + a^i-i + + ®.+i + • • • + a:*)"* ^ J 

is a generating function for the moments of the variables «,. This generating 
function will enable us to find the exact expression for F(ri) for we have 


P(ut == itu) = Coefficient of H C“ in tp(ti) 

1 

.2, [:] 5 ["■»:']/[:]■ 


I< ntl—ni~a{ 


Also 


^w-n(n-i>w/[",] 

and equating the expressions on the right of the last two equations we have 


(4.10) 


(4.11) 


F(u) 




in which the prime on the n(, indicates that the indices corresponding to j = i 
are to be omitted; hence i takes all the values 1, 2, • •. , fc and j takes all values 
1, 2, . •. , A: except i because the index n„ has been cancelled with n, — r< in 
the binomial coefficient in the denominator of (4.10). It is clear from (4.11) 
that F{ri) may be expressed as follows 


^ ^ F(rO = Crn*r'(a:i + 

(4'.12) i 


+ »fc)*'(®2 + + • * ■ + * 

(*! + *» + • • • + XkY* ^ (aji + • • • + 

in which “CT" is an abbreviation for “constant temi of." 
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We are now in a position to obtain momenta of the variables r.j in the distribu¬ 
tion (4.1) by means of identities similiar to (4.3). As an illustration we compute - 

k 


£ (:;::: 0 n :;)= s (-::: i) ft (:: i) 


.crna:r‘n(a:i+. 

1 I 

^CTl[xr''{xx + 

1 

_ fnl (n - ni)‘“' 


i 

Jl.-O 


+ t,Xi -(-••• -t- XkY' ^ 

• -|- Xk)" "{xi 4- ■ ■ • -b Xife)° 


n<“> 


or 


(1.13) £ ft;: “:ft ft:;) no. 

^ ’ Hn.) 

' 1 

The moments of r,, may be computed from identities of this type together with 
(3.3). The first two momenta are 

(4.14) E(r.,) = (n - n, -b l)®n'’Vn‘'''® 

(4.15) E{r\f = - n.)<*'(n - nv + l)«Vn«^+® 

(4.16) Eir„ru) = (n - n.)® (n - «< -t- l)‘«/n‘^+‘+« j t 


E(r„r,i) = (n, - j - 1) (n, -1 -1) 


71(i+(+2) 


{(n,-j-fl)®(n.-f-bl) 


\(a) 


-b 2(n — Hi — n.) (n, — j -b 1) (n. — f -b 1) {n, — t + n, — j) 

-b (n — n, — n*)®[(w. — t + 1)® + 2(n< — j + l)(n. — f + 1) 

+ («• ~ j + 1)^*'] + 2(n — n, — n,y^\ni — j + n, — t + 2) 

-I- (n — n. - 71,)®} + 2 ( 71 , - j - 1) 1 -j + l) 

.in,-t + D® -b (71 - - 71.)[2(71. -j + l)(7i, - f -b 1) 

-b ( 71 , — f + 1)®] + (ti — 71 , — 71 ,)®[ 2 ( 71 , -r f + 1) + (n. — i + 1)1 

-b (71 — Tli — 71,)®} -t- 2 ( 71 , — t — 1) '^(,+,+ 1 )— {(w, — < + 1) 

• (n,- — j + 1)® -b (71 — Tli — 7t,)[2(71, — j + 1)(71, — t + 1) 

+ (n, - J + 1)®] + (71 - 71, - 7i.)'*^[2(n. - j -t- 1) + ( 71 . - t + 1)1 

-b (71 - 71. - 71.)® 1 -b 4 {(m - j + 1)(71, - < + 1) 

-b (71 — 71, — 71.) (Tli — J + «* — i + 2) + (71 — Tli — 71,)®}. 
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Such a lengthy expression as this last one can hardly be useful to the statistician 
and for this reason we shall not define variables s,j analogous to the su and Sj 
of Section 2 and take the time and space to find their moments. 


6 . Asymptotic distributions. We shall show that some of the distributions 
obtained previously are asymptotically normal when the n, become large in 
such a way that the ratios n,/n remain fixed. The description “asymptotically 
normal” means that the distribution approaches the normal distribution uni¬ 
formly over any finite region as n, —> oo. The ratios n,/n will be denoted by 
ei, hence 2 e, = 1. The symbol 0(l/n“) will represent any function such that 

Lim n“0 (—^ = L < «>. 

TI-*00 J 

We shall not, of course, be able to get any limit theorems for distributions 
like (2.6) or (2.9) because the number of mdependent vanables increases with 
n. We shall consider first the distribution (2.13) whose asymptotic character 
is given in the following theorem. 

Theorem 1. The variables 


(5.1) 


Xi 


_ fu 


—■ nelel 
y/n 


Tk 


8 ik — 716162 


y/n 


i < k 


are asymptotically normally distributed with zero means and variances and co- 
variances 


Oi, = e;'''^^e 2 [(i + l)(j + 1)6162 — ije2 — 2 ei], i, j < k, i ^ j 

ffn = ^ 62 [(f 4" l)^fii 62 — 1^02 — 2 ei] + elel, i K k 

(5‘2) , 

Vi* = e'l ^^“[(z + 1)^6162 — ikez — ei], i < k 
aick = 61 * ^e2[k^iei — l)e 2 — ej + 6162. 

The limiting means, variances and covariances are obtained from the relations 
(3.6), (3.7), (3.8), (3.13), (3.14) and (3.15). 

To demonstrate this theorem we make the substitutions 


(5 3) 


n; = ne, 

sw = nei62 + \/nx. 
Silt = ne\e 2 + \/n x* 

k 

Si = neie2 + y/n 23 


i=l,2 

z — 1, 2, 1 


A = n(ei — e\ — ke\6^ 4- y/n 23 iXt 

1 

in (2.13), and estimate the factorials by means of Stirling’s formula 
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The result is an unwieldy expression which we shall not present at the moment. 
First we note that the exponential factors cancel out because the sum of the 
lower indices of a binomial or multmomial coefScient is equal to the upper index. 
Also we simplify the expression by considering in detail only terms which involve 
the Xt ; the normalizing constant can be determined from the final limit function. 
Any function of the parameters will be represented by the letter K. Thus in 
(5 4) we need consider only the factor All factorials will be of the form 

(5.5) m = na + -s/nLix) + h 



where L{x) is a linear function of the a:,, and a and h are independent of n and 
a:,. Now 


m 


TO+J _ 


= (na + ^L{x) + 

= (l + ^ ^ 

\ 0 y/n an/ 

= ^1 + ^ + 4 ^’ 


n 0 +\/fi L (*) + 6 •fj 


Ljx) ^Y“+VnI-(*)+!’+i 
0 s/n ~ an) 


and log = X + -s/nLix) log no + (na + V nL{x) + & + i) 

\ 0 y/n an/ 

= K + ■\/nL{x) log no + (no + y/nL{x) + J» + I) 

(5-6) / o / \K 

■ \o y/n an d/n \n^^^J) 

= K + ^/nLix){l + log no) + + 0 


so terms arising from h (and 5 + § m the exponent) will be neglected as they 
give rise only to terms mdependent of the Xt or of order 1/n. Of course log 
(1 + 0(l/m)) = Oil/m). Thus, keeping significant terms only, the result of the 
substitutions (5.3) and'(5 4) in (2.13) after taking logarithms and usmg (5.6) is 

k—i ^ ^ 

-log P(r,) = K + (log n«ie 2 + 1) + L ^2 

— \/n (log 2? 1? 
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(5.7) + Vn{^ix, + {h - (log ne\ + 1) ~ ^ + {k - l)x^ 

+ 2 \^nxk (log neUi + 1) + ^ ~ Vn f (log nei'^'' + 1) 

The coefficients of Xi{i < k) and a:* are 

Vn(log neiel + 1 — log — 1 + i log net + i — i log — i) = 0, 
'v/n( — log nc 2 - 1 + fc log 7ie{ + A: + 2 log nele^ + 2 — A: log ne\^^ — fc) = 0. 
Hence only the quadratic terms remain and we have 

( 6 . 8 ) 

where 


(6.9) 


-log p = z + i E 

/^XiX, + o(^-) 

_ 1 , ijei 

.1 fc+i 

h 3 < h 

ej fii 


1 , 1 , 
el el/, ^ er 

i < k, 

_ 1 ^ i -h t(k — l)e, 

J -*+1 

i < k, 


ej 


ei 


** _ 1 , 2 , (A: - D* 

- 1 *- 


It is merely a matter of straightforward multiplication of the two matrices to 
verify that || o-” || is the inverse of || <r,/ jj, hence is a positive definite matrix. 
The details of the verification will be omitted. We have then 


( 6 . 10 ) 


P = + 0 


In this equation K must necessarily contain the factor 




because there 


are A: + 6 factorials in the denominator and 5 in the^ numerator of (2.13). 
Since Ar< = 1, this factor, in view of (5,1), may be replaced by HA®*, so 

(6.11) P = (l + 0 ■ 

V 

If we restrict the x, to any finite region R in the x-space, the function 0(l/n*) 
approaches zero uniformly as n —» «. Thus, if < Bi are any positive 
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numbers such that the corresponding values of Xi, say a, and hi, obtained by 
substituting A, and B, for r, in (5.1), determine a rectangular region E'(a, < x, < 
bi), which lies in we have 

r^-Ai x,-a, \ Xy/n// 

( 6 . 12 ) 

Jr' 


by the definition of a definite integral and Riemann’s fundamental theorem. 

We have given some details of this proof in order that it may serve as a model 
for other theorems of a similiar nature which will appear later, and for which 
a complete proof will not be given. Two immediate consequences of Theorem 1 
will now be stated as corollaries. 

Corollary 1. The variable 

r — neiBa 

X “ — — I 

y/neiBi 

where r is the total number of runs of one kind of element, is asymptotically normally 
distributed with zero mean and unit variance. The limiting mean and variance 
were computed from (3.20) and (3 21). 

Corollary 2. The variable Q = is asymptotically distributed accord¬ 

ing to the x^-law with k degrees of freedom. 

In exactly the same manner in which Theorem 1 was deduced from (2.13), 
we may prove the following theorem corresponding to the distribution (2.14). 

Theorem 2. The variables 


X, = 


— neiet 


Vn 


i < k, 


(5.13) 


su — ne-iei 

** -7 =—) 

vn 


yi = 


2 < 

Ssf ~ neiCz 


Vn 


<K, 


are asymptotically normally distributed with] zero means and variances and 
covariances 

i, j < fcf 

i ^ kf 

i < k, 




= + l)(i + 

(txixi - el'~V2[ii + l)*eie2 - i% - 2 ei] + e{el 

(Txixt, = e\'^'‘~^e\{{i + l)keiei - ikei - Ci] 

<r**x* = fc’e* - ej + ejej, 
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h 3 < h, 
i < h, 


(6.14) ~ 1)0' + l)®iC2 ““ “ 262 ] 

‘^mvi — ®2* *®i[('^' + l)^6ifi2 — ‘iSi — 2 ^ 2 ] + elel 
<rx,v! ~ + l)(j| 4" 1)6162 — 2ze2 — 2jei + 4eie2 + 2] 

^ <k3 <h, 

(^x^vi = e\^^e\[k{3 + 1)6162 - 2{k — 1)62 - (j - l)ei + 2eye^ j < /i. 


These limiting variances were computed from the variances and covariances 
given in Section 3. We have chosen the variable Sih of (2.14) as the dependent 
variable. The proof of this theorem is omitted. From it the followmg corol¬ 
laries are deduced immediately. 

CoBOLLAHY 3. If u, = Xt and uit+, = y, of (5.13) and || o-” |[ ( 2 , j = 2, 

• • , k + h — 1) denotes the inverse of (5 14), then the variable Q = Xa'’UiU, is 
asymptotically distributed according to the x-law loith k + h — 1 degrees of freedom. 

COEOLDABT 4. If Si = Si, + S 2 i denotes the total number of runs of both kinds of 
elements of length i, and Sk the total number of runs of length greater than k — 1, 
then the variables , 


(5.15) 


X, - 


Xh 


Si 1 l{ci 62 ”h 

\/n 

Sk ~ n(eie2 + ejei) 


V; 


n 


i < k 


are asymptotically normally distributed with zero means and variances and 
covariances 


(5.16) •Tij ~ •r*,*/ 4” 4“ ")“ •r„ii/y • 

We have put h = kin Theorem 2 to obtain this result. The terms on the right 
of (5 16) are defined by (6.14); terms which do not appear there may be found 
by mterchanging ei and 62 in one of the relations, For example try^vh is given by 
interchanging ei and et in the fourth equation of the set (5.14). 

CoHOLLAKY 5 The variable Q = 2)<r'’x,Xj where the are defined by (5.15) 
and II (T*''|| IS the inverse of (5.16), is asymptotically distributed according to the 
X-law with k degrees of freedom. 

CoHOLLAHY 6. If s denotes the total number of runs of both kinds of elements, 
then the variable 


^ _ 8 — 2716162 
2-\/neiet 

is asymptotically normally distributed with zero mean and unit variance. This is 
the result derived by Wald and Wolfowitz [13]. 


6 . Asymptotic distributions for k kinds of elements. We now investigate the 
asymptotic character of the distribution ( 4 . 2 ) 
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( 6 . 1 ) 



where Ti is the total number of runs of the ith kind of element. 
Theorem 1. IJ k > 2, the vanahles 


( 6 . 2 ) 


X, = 


r, — we.(l — e.) 

y/n 


are asymptotically normally distributed with zero means and variances and 
covariances 

(6.3) ff.;- = , <r« = ei(l — eif. 


The restriction /c > 2 is made because m the case fc = 2 the correlation between 
the two variables approaches one, and the numbers a,, are all equal. The result 
may be called a degenerate normal distribution and might be included in the 
theorem in this sense; we have chosen to omit it because this case is better taken 
care of by Corollary 1 of the previous section. 

The proof of this theorem will be simplified if in the moments (4.5) we replace 
the numbers n,' — 1 by n,. This substitution wiU not, of course, affect the 
limiting moments. Hence we consider the vanables v, with moments given by 



are asymptotically normally distributed with zero means and variances and co- 
variance (6.3). It is possible to prove this statement by showing that the 
characteristic function (Fourier transform) obtained by substituting id, for U 
in the moment generating function 


<fi.{ti) = Coef. of n ®r‘ in 


( 6 . 6 ) 


n (aJi + • • • + a:.-i + UXi + *,+1 + • • • + XieY* /[:] 


approaches 


vidi) = 
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as 71 —> «. This method is not appropriate for proving a similiar theorem 
which appears in Part II, and we prefer to give here a demonstration that mil 
suffice for both theorems. 

In order to prove our theorem we consider the general term in the coefficient 
of na:r‘ in (6.6) 

in which 

^ t 

( 6 . 8 ) = n, 


must be required as well as the usual restriction on indices of a multinomial 
coefficient, 22 . Therefore only {k — 1)^ of the indices are independent. 

j-i 

Clearly mu = «<. Now without concerning ourselves about the statistical 
significance of the variables m,/, let us consider their distribution 

(6.9) "Mi,]/[»”] 

in which the variables corresponding to the values f, j = 1, 2, • • • , A — 1 will 
be chosen as the independent ones. We shall now prove a theorem from 
which Theorem 1 follows immediately. 

Theorem 2, The varidblea 


( 6 . 10 ) 


Xij = 


ntii — neiSj 


i, i = 1, 2, ..., A: - 1 


are asymptoiically normally distributed mth zero means and variances and co- 
variances given by 


( 6 . 11 ) 




= e,e,(l - e<)(l “ e/). 


First it is to be noted that the moments of the m,j are easily obtained from the 
identity 


z n[^] = [r' 


i] i L^/J i i 



t 



( 6 . 12 ) 

as follows 
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and on dividing- this last relation by ” 

(6.13) £(n = n n 

'•J • I 

from which the moments (6.11) and the means in (6.10) were computed. 

The proof of the theorem is similar to that of Theorem 1 in Section 6. We 
make the substitutions 


we obtain 


Tii = ne ., m*, = — 2 


m,,, 


1-1 

k-l Jt-1 

= n, — 52 m„, rrikk = 2 tia: + mt, — n, 

j-i ii)-i 

rrit, = ne,e,- + -y/nx ^,, 

in (6.9) and employ Stirling’s formula exactly as before. The details are too 
similiar to warrant repetition. The final result is 

k 

(6.14) Dim.,) = da.,. ^1+0 . 

^ Where \\ |1 is the inverse of (6.11) and is defined by 


‘/ija _ ^ 






eiCk 



•).p> 


"t" 2 ■ 


Theorem 1 is a corollary of Theorem 2. Also we may state these additional 
results: 

Corollary 1. If k {> 3) kinds of elements are arranged at random and r 
denotes the total number of runs of aU kinds of elements, then the variable ' 


r - n(l - Ze<) 
\/ n 


is asymptotically normally distributed with zero mean and variance 

= le\ - 2Se! + (Ze?)’ 

where et is the proportion of elements of the i-th kind. 

Corollary 2. The variable Q = X<r'’x.x, where the x. are defined by (6.2) 
and II a'’ || is the inverse of (6.3), is asymptotically distributed according to the 
X-law with k degrees of freedom^ 

As was mentioned in Section 4, we could define variables Sj, (i = 1, 2, • ■ ■ , k 
and j = 1,2, hi, the hi being a set of k arbitrary integers) with a distribu¬ 
tion similiar to (2.14). If one worked through the details he would find, no 
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doubt, that these variables arc asymptotically normal. The matrix of vari¬ 
ances and covariances is so complicated, however, that such a theorem would 
hardly be useful to the statistician, and the author does not feel that it would 
be worthwhile to go through the long and tedious details merely for the sake of 
completeness. 

Pakt II 

Instead of having the number of elements of each kind fixed, we now suppose 
that they are randomly drawn from a binomial or multinomial population The 
numbers n, thus become random variables subject only to the restriction that 
Sn. = n, the sample number. The development will be entirely analogous to 
that of Part I, and the same notation will be used. The probability associated 
with the ith kind of element will be denoted by p,. 

7. Distributions and moments. The major part of the derivation of the 
various distribution functions has already been done in Sections 2 and 3. With 
the distributions of these sections we need only employ the fundamental 
relation 

(7.1) P{X, F) = Pi{X I F)Pa(F) 

in order to obtain the distributions required here. X will represent the set of 
variables rp' or r,, and F the variables nj. For the binomial population 
Pi{Y) will be 

(7.2) Pinuri^)^(^'^p”,'pr. 

Therefore we may wnte down at once the distributions 

(7.3) P(r,,, nO = Fin, n)p:'p^, 

(7.1) re-..', «.) = [:;_](”’ 

(7.6) p(r„«,) - :;) ("= + ‘yr'ft-, 

(7.6) P(^„ »,) - [^](”- - ■')(”* t 

Ph-b- ih~ s,)p:^pr, 

\ Sth — 1 ) 

f = 1, •.■ j = 1, ■ • 


(7.7) 
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corresponding to the distributions (2.6), (2.9), (2 11), (2 13) and (2.14) respec¬ 
tively. Of course there is some dependence among the arguments. In (7.4), 
for example, ni is determined by Siri, = ni , and th hy n — tii = in . In the 
last three distnbutions one of the is independent and one may sum these 
with respect to ni from zero to n and obtain the distributions of the r’s alone 
The results of such summations are quite cumbersome and in some cases can 
only be indicated, so we shall retain the re, as relevant variables This remark 
applies also to the multinomial distribution 
We shall obtain expressions for the joint moments of the variables m these 
distributions. It is clear that the moments in Section 3 will be of considerable 
aid; for, using the notation of (7.1), we have 


(7.8) E(J{X)giY)) = Z/(X)ff(y)P(X, Y) = E 

ZY Y X 

and the sum in the bracket on the right has been computed in Section 3 It re¬ 
mains only for us to multiply the previous moments by g{Y)Pi[Y) and sum on 
Y. Corresponding to (3.4), (3.12), (3.9) and (3.19) we have 


(7.9) 

(7.10) 

(7.11) 


(7.12) 


sf-r’ fti-!;'’) -1 «!•’(». +1)''-’(“ r ^“■)prK’, 

\ 1 / ni“0 \ ni — Sza, / 

n sfr’) = t + !)““> (” ~ 

\ 1 / ni =0 \ rei — SlO, / 

E(»!-VS") = t,»!■>(«•+ I 

E(»i->fi*’n.g->)- i: 

\ 1 1 / \ Si 2/ flt i / 

/nz - 2A + h~ sj)pJ‘pr», 

\ S2 — S'6, — 1 / 


for moments from (7.4), (7.6), (7.5) and (7.7) respectively. In order to perform 
the summations indicated in these last relations it is necessary to expand the 
factors multiplying the binomial coefficient in factorial powers of its lower 
index. That is,' we must write 

0 + 1 . 

(7.13) n[‘‘\rh + I)'”’ = 2 CM, a, b)(ni - hY'K 

»—0 


Again it is not possible to give a simple expression for the coefficients Ci{n, a, V) 
in general, but for the first few moments they present no difficulty. For example 
from (7.9) 
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Eintru) = L Wi(n - ni + 1) ^ 

m-o \ m / 


1 «,”3 

Pa 


(7.14) 


= 2 [*(” - i + 1) + (ra - 2%){ni - i) + (ui - i)®] 


ni 




in — t — ly 
V ni- i > 

Ipr^Pa"* 

II 

|^i(n - t + 1) ( 

(n — % — 

\ n\ — i 


1 

1 

^ - (n - 


\ni — 1 — 1 

= [i(.n 

-! + !)+(« 

— 2i)(n — 


9^p?’ 




We give below some means, variances and covariances which will be required 
latei:. 


E(ru) = pipi[(n - i - l)p 2 + 2], 

E{sik) = Pi[(n - A:)pa + 1], 

"■'■i.'-ii = pl''’^Paf(«' — i — jV^pl 4- (n — i — i)P2(l + 5pi) + 6pi 

- I(n - i - l)p* + 2][(n - j ~ l)p, + 2]), 
= Pi^aK” - 2 i)®P 2 + (» - 2a)p,(l + 5pj) + 6pi 

- [(n - i - 1 )pj + 2]*} + PiPalCn - i - l)p, + 2], 
(7.15) = v\Vi{{n - i - j - 2)®PiPj + 4(n - i - j - l)p,pj, + 2 


- [(w - i - l)pa + 2][(n - j - l)pi + 2], 
, <^..*. 1 * = ?{■"*?*{(« - i - + 1)'*’ - 2(n - t - ft)®pi 

+ (n - i - fc - l)®pi - [(n - a - l)p 2 + 2][(ra - fc)pi + 1]}, 
= P“{(n - 2* + 1)® - 2(n - 2fc)®pi + (n - 2kfVi 

— [(n — k)pi + 1]*} + pi[(ra — k)pt + 1], 
= PiPaK” - k - j - 2)®p5pj + 2(n - k - j - l)pi(l + pj) 
+ 2(1 + pi) - pi[(n - k)jH + l][(n - j - l)pi + 2]). 

In order to obtain the distribution of nms in samples from a multinomial 
population, we multiply the distributions of Section 4 by 

(7.16) p(«i) = [”Jr[p?'. 


Corresponding to (4.1) and (4.2) then, we have 

(7.17) P(r.,, ni) = fl [!• V(»-.) 11 P."* 

-1 uaj 1 

p(n.no = n(”;iJ)F(r,)np?‘- 


(7.18) 
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In (7.17) Ti) is the number of rune of length j of elements with probability p,. 
In (7.18) r, is the total number of runs of elements with probability p.. As 
before, we shall investigate in detail only the distribution (7.18). The moments 
of n, and r, fdllow at once from (7.8) and (4.5) 

(7.19) E (fl (n(“'’= E ri - 1)‘‘‘’) " ^*’‘1 fl p"‘ 

where w, = n, — r,'. The means, variances and covariances of the u are 
E{r,) = np.(l - p.) + p], 

(7.20) (Tr.ri-- 2p. - 2p, + 3p.p,) - p.p,(2pi + 2p, - 5p<p,), 

(Turt = «p.(l — 4pi + 6p^ — 3p!) + pi(3 - 8p, + 5p!). 


8. Asymptotic distributions from binomial population. We turn our atten¬ 
tion first to the distribution (7.7) and state a theorem analogous to Theorem 2 of 
Section 5. 

Theorem 1, The variahlea 


( 8 . 1 ) 


Ut = Xi = 


Uk ~ Xk - 


wjk+i = y. = 


Uk+k = z = 


Sit - nplpl 



y/n ’ 

Sl* 

— npipi 


-t/h 

S2i 

- nplpl 


yfn 

ni 

- npi 


y/n 


f * 1, ..., fc 1, 


f — 1, •.., h 1, 


ore asymptotically normally distributed with zero means and variances and covari¬ 
ances 

= i)}p2 — (2t -b l)pi‘p» + 2pi*'*^Vj) 

= ~ (f + J + l)pi'*’^pa + 2p{'*"'^V» I 
= —(i + k-\- l)pl'^*pj -b pi'^*'^*p|, 

= PiPi — (2fc -b l)pi*p» > 

^VtVi = ~ (» + J + l)PiP»^^ + 2p’pS‘'’^\ 

(8.2) = plpi - (2i -b l)ptpr' + 2p\vV^\ 

<faiv, = ~{i + j + ^)pl^^pi^ + ^Pi^^Pa^^, 

= — (A + i + 2)pi'^*p*^* + pl^'p»(l + Pj)i 
<r.,. = tplpl + pI'^‘pj( 1 - 4pj), 

<r**j = (k + 1)PiPj — Pi(l + Pi), 
ffvi, = iplpi + PiPa'^‘(l - 4pi), 

<r„ = Pipi. ' 
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We have taken S 2 h and nj to be the dependent variables of (7.7). The method of 
proof of this theorem is the same as that of Theorem 1 in Section 5, and will be 
Emitted. As consequences of the theorem we have 
Corollary 1. The variable 

k+h 

1 

is asymptotically distributed according to the x-law with + A degrees of freedom. 

Corollary 2. Any subset Ui^ ,u,^, ■ ■ •, Ui„ of the variables (8.1) is asymptoti¬ 
cally normally distributed with zero means and variances and covariances H o-ij^ 1|, 
and 


m 

J,*-l 

IS asymptotically distributed according to the x-law with m degrees of freedom. 
j ff’’*'' 11 is the inverse of H <r,y,* |1 . 

Corollary 3. If s, = Si, + represents the total number of runs of length i of 
both kinds of elements, and Sk the number of runs of length greater than k — 1, then 
the variables 


(8.3) 


_ s, - n{p[p\ + pipt) 

Z,--;=-, 

V W 

^ Sk ~ n{p\pi + piPi) 
as* = -7=-, 

\fn 


f = 1, • • •, fc — 1, 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(8.4) (Tv; — "b d" 

where the terms on the right of (8.4) are defined by (8.2). We have put h = k 
in Theorem 1 to obtain this result. 

CoRQLLARY 4. The variable 

k 

(8.5) Q = 2 <r*^x,Xj 

1 


where the x, are defined by (8.3) and j| <r’' || is the inverse of (8.4), is asymptotically 
distributed according to the x-law with k degrees of freedom. 

Corollary 5. If r denotes the total number of runs of both kinds of elements, 
then 


( 8 . 6 ) 


X = 


r — 2npipt 


2\/npipi{l — SpiPi) 

is asymptotically normally distributed with zero mean and unit variance. This is 
the result obtained by Wishart and Hirshfeld [11]. 
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9. Asymptotic distributions from the multinomial population. In this 
section we assume fc > 2 to avoid degenerate distributions. Because of the 
function F(r.) in (7.18) we do not investigate this distribution directly, but 
derive a more general asymptotic distribution as was done in Section 6. We 
consider the distribution 

(91) 


correapondmg to (6.9). This is derived from (7.19) in the same manner as 
(6.9) was from (4.5). As before, we have replaced the numbers n, - 1 in (7.19) 
by n,, an unessential change as far as the asymptotic theory is concerned. 
We recall that 

(9 2) ’ = n, - m„ 

hence we need only show that the variables on the right are as 3 Tnptotically 
normally distributed in order to have the same result for the r,. Corresponding 
to Theorem 2 of Section 6, we state 
Theorem 1. The vanahles 


(9.3) 


nui - np.p, 

Xti y— 

vn 

1 

fie. 

W 

nj — np, 

Xx — 

vn 

t = 1, • > •, fc ~ 1 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9.4) 


— ~^prPiP>Pt } 
ftu.it = ~3p,p.p(, 

= P<Pi(l - 3p<p,), 
~ ^ViVi ) 

<r„,, = -2piP., 

= 2p?(l - p.'), 


— 3p,p,P(, 

— p,Pj(l ~ 3p,), 

<7..... = pJ(l + 2p, - 3p!). 

v,,.. = •“2p,p,p,, 

<7„,i = P.P;(1 - 2p<), 


<7..v = P.(l - V^)^ 


In these relations the symbols are defined by 

~ , ea,, — ~ 

and different literal subscripts represent different numencal subsc^ts. These 
momeute have been computed by means of the identity (6.12). e 
the theorem is like that of Theorem 2 of Section 6 and ^ be omitted. We can 
now give the limiting form of the distribution of the Ti in (7.18) as 
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CoHOLLAKY 1, The Variables 


(9.5) 


X, = 


r< - npijl - p.) 


t — 1»2, • ‘ • I h 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9.6) 


ffii = Pi(l — Pt) — 3pJ(l — p,)®, 

(r,i = - 2pi - 2p,' + 3p<p,). 


These limitiiig moments follow at once from equations (7.20). 

Corollary 2. The variable 

k 

^ “• (T Xi Xj 

1 

lo/iere Xi arfi defined by (9.5) and || o-*' || is the inverse of (9.6), is asymptotically 
distributed according to the x-law with k degrees of freedom. 

Corollary 3. If r = hrx denotes the total number of runs, then 

r - n(l - Sp*) 

is asymptotically normally distributed with zero mean and variance 
<r‘ = hp\ + 22pJ - 3(Sp?)’. 


The author would like to record here his gratitude to Professor S. S. Wilks 
who suggested the problem and under whose direction this paper was written. 
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A GENERALIZATION OF THE LAW OF LARGE NUMBERS 

By Hilda Geibingeb 

It is well known that the law of large numbers can be established for dependent 
aa well as for independent chance variables by using TchebychefE’s inequality [1] 
and assuming that the variance of the sum of the variables tends towards 
infinity less rapidly than 

In recent years v. Mises has introduced the notion of statistical functions [2] 
and has shown that, under certain assumptions the law of large numbers is still 
valid if, instead of the arithmetic mean of the n observations xi, • • ,i„ a 
statistical function of these observations is considered. For example in the very 
special case, where the n collectives which have been observed are identical 
fc-valued arithmetic distributions with probabilities pi, • • • , corresponding 
to the attributes a, ,c* and with observed relative frequencies Ui/n, • • , 
Uk/n one obtains the result: It is to be expected for every c > 0 with a probability 
P» converging towards one as n -> «, that ] /(ni/n, • • ■, njt/n) - /(pi, • •, p*) 1 
< e under very general conditions concerning the function /. 

In the present paper we shall generalize these new results so that they will 
apply also to collectives which are not mdependent. 

1. Lemma concerning alternatives, Let us consider the n-dimenswnal 

collective consisting of a sequence of n trials and let us assume that the n trials are 

alternatives, i.e. for each trial there are only two possible results which we 

denote by "success," "failure,” by "occurrence," "non-occurrence" or by 

"1," “0.” The total result of the n trials is expressed by n numbers each equal 

to 0 or 1. Let Xj, • ■ ■, a:„) be the probability of obtaining the result xi 

at the first trial, xa at the second one, • • ■ , Xn at the last one (x, = 0, 1; v = 

1, ,n). In the same way we introduce Vn(x, y) = 2 t>{x, y,X 3 , , x„) 

^ 8 ) 

and generally Vi,v{x, y) as the probability that the ^th result equals x, the vth 
equals y, (p v), and finally let v^ix) = £ y) be the probability that the 

y 

/ith result equals x. In particular let us write 

i'ii(l) = P(i I V(l) 1) = ) (ji,v = 1, ,n] li v) 

fii being the probability of success in the pth trial and the probability of 
simultaneous success both in the yth and vth. trials. 

The variance s\ of the sum (xi -f • • + x„) is easily found: 
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Sn = Var (a:i+...+x„) = 2] (si + • • • + a:«~Pi- 

*1.*‘ .*fl 

= 2 (Xl~ PlYviXi, , ...,Xn) + ■•■ 

*1. ■ .®ii 

+ 2 2 (a:i-pi)(a;2-p2)2;(a:i,-.-.In) + ... 
*!•* *•*« 

= 2 (®1 ~ + • • • + 2 2 “ Pl)fe “ P2)i'l2(®l , * 2 ) + ■ . . 

= Pl{l - Pi) H-H Pn(l - Pn) + 2 (Pi 2 - P 1 P 2 ) H-h 2 (p„_i,„ - p„_ip„). 

Thus: 


fi n 

(1) = Var (a:i + ... + Un) = 2 P»(l - P^) + 2 2 (p/.v - p^p,). 

V—1 

The first sum on the right is ^n/4; the second one consists of JV = ^n{n — 1) 
terms, therefore we cannot be sure that it tends toward zero after division by 
Putting p„, — Pi,p, = we see immediately: 

(a) A necessary and sufficient condition for lim sjn = 0 is 

n-^tc 


( 2 ) 


lim l/n^ 2 = 0. 


Denoting by vj the variance of v^(x) and by r„, the correlation coeflBcient of 
y) we have 


= 




Psr PliV" 


Tfiv(T^(Ty t 


We see that takes values between —1/4 and +1/4 and our conditions (2) 
postulates that the sum of these positive and negative terms tends towards 
infinity less rapidly than 1 ^. As to the meaning of the signs of these terms we 


see that a term will be ^ 0, according as p^vlpv ^ Pn . 


This means: the 


fact that the jith event has presented itself makes the occurrence of the /uth 
event either more probable; or it is without influence on it; or it makes it less 
probable. And we see that sjn tends toward zero, only if there is a certain 
“equalization” or “stabilization” of positive and negative mutual influence. 
If in particular for a pair of values n, v, r„, = +1, that is 1) = v(l> 0) = 0, 
the events must either both occur or both fail and p^ = p,. If v = -1 we 
have Vnrifi, 0) = ii,j,(l, 1) = 0 the simultaneous occurrence is impossible and 
likewise the simultaneous failure, and p^i + p» = 1. If we have p^„ = 0 (case of 
mutually exclusive events) then p^ + p, S 1. 

n n 

Since s* S 0 and 2 ~ P») = vj § n/4 we conclude from (1) that 

y—l »—1 

n 

S ^ — n/8 and we obtain the following simple sufficient condition for the 
validity of (2): 
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(b) Let us denote hy win ike number of all combinations u, v(ii ^ n;v n‘, n ^ v), 
such that,^ however large n may be, > e, where e is a given positive number] 

then 4 2 converges toward zero if lira Wa/n® = 0. 

We have in fact 


u ^ 

£mn + (N - mn)e 

O p.v"! 

1 7» ' ^ 

and dividing by we find that— ^ is enclosed between — and m„/n“ + 

n inr-l 8n 

< ——^ which both tend toward zero. Roughly speaking this condition implies 

71 

that for “almost all” combmations of mdices n, v, the a^f converge toward 
“negative or vanishing correlation ” 

n 

On the other hand the sum of all positive and negative terms in ^ 

cannot become less than — n/8. Therefore, if “almost all” positive terms are 
supposed to tend towards zero it follows that also almost all negative terms 
tend toward zero. Thus we obtam the sufficient condition (c) which is neither 
more nor less general than (6) • 

(c) The sum — ^ aj,f tends towards zero as n-r a>,if "almost all" the indi- 

tndual terms = p,,, — p^Pt tend toward zero. Or more exactly, the sum in 
question tends toward zero if \ | S «for every e and sufficiently large n with 

the exception of pn terms where lim pn/n^ = 0. That is “convergence towards 

n-*« 

independence” for almost all combmations p, v of indices Let us, for example, 
assume that all the p, are 0 and all the = 0, then all the are certainly 

< 0 and (b) is fulfilled; but it is easily seen (3) that in this case pi + pn ■ ■ 
Pn ^ 1. Therefore all the products p^p, (with the possible exception of a finite 
number) tend toward zero, and (c) holds as well. 


2. Statistical functions. Suppose n observations have given the results 
xi, * 2 , • ■ , . Let us assume for the sake of simplicity that they are all 

bounded between two real numbers A and B, To each real x corresponds the 
number n Bn{^) of observations with a result ^ x Sn(.x) is a monotone non¬ 
decreasing step function with n steps, each of height 1/n; however several steps 
may coincide at the same point. We have 

{1) Sn(x) = 0 if X < A and S„(j;) = 1 if x ^ B. 

Sn(x} is called by v. Mises the partition (Aufteilung) of the n observations 
Snix) coincides with the well known cumulative frequency distribution if the 
attributes c, (k = 1, ■ • k) and the corresponding relative frequencies ni/n, • ■ • 
Wh/n are given, 
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A statistical function is a function of the Xi ,X 2 , ■ ■ ■ ,x„ which depends only on 
Sn(x), the partition of the n results. It will be denoted by /{ S„(x )}. If the c, 
and the n,/n are given then statistical function means simply “function of the 
relative frequencies” and it becomes a function of k variables. In /((S„(a:)) the 
partition S„(a:) takes the place of the independent variable. Such a statistical 
function has the following properties: (a) It is a symmetric function of the 
Xi, xi, , x„ . That is, it is independent of the succession of the n results. 
(6) It is “homogeneous” in the following sense: If instead of n ohservations 
we have nl observations and if at the same time each Xy is replaced by lx, then 
the'Statistical function is not changed.^ Examples of statistical functions are 
the moments 

-11 x:^ f x'dSnix) = ilf? 

n ,-i J 

or, if Ml = a, the moments about the mean a: 

- 2 (a^)' — ot)' = f (x — ctYdSnix) = Mr, etc. 

The independent variable in /(/S„( 2 :)) is a partition; but in addition we shall 
define f[P(x )) where P(x) is a certain bounded distribution which is not neces¬ 
sarily a partition. A distribution P (x) is called bounded if 

(!') P(x) =0 if X < A and P{x) = 1 if x ^ B. 

If this is true for a sequence Pi(x), Pi{x), • • ■ with the same A and B then the 
sequence is called uniformly hounded. Let us now consider a bounded partition 
P{x) which in every point of continuity of P{x) is the limit as n —>•« of a se¬ 
quence of bounded partitions Sn{x). As /S„(x) converges toward P{x), if 
/{Sn(x)) converges towards a limit L which does not depend on the limiting 
process Snix) —> P(x) then that limit shall be denoted by/[P(x)); it will be 
called the value of the statistical function at the "point" P{x) and/{(S„(x)} will be 
called continuous at P{x) The definition of continuity can be given also in the 
following way: Corresponding to every « > 0 exists an t; > 0 such that 

(2) \f{Snix)] -f{P{x)]\< e 

for all values of n and for every bounded S„(x) such that at every point of 
continuity of P(x) 

(3) I S„{x) - P(x) I g 

In this case/{iSn(x)) is called continuous at the point P(x). Thus a statistical 
function is defined for bounded partitions and for certain bounded distributions 
which are not themselves partitions. If the continuity defined by (2) and (3) 
exists for a sequence Pi(x), P^lx), • • • of bounded distributions with the same v 


' This condition of homogeneity is fulfilled e g. for but not for XiX 2 ••• Xn • 
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corresponding to a given e, we call the statistical function uniformly continuous 
at the points Pi{x), Piix), ■■■ . 


3. The general law of large numbers. The generalization of the law of large 
numbers which we have in mind can be demonstrated in a way analogous to the 
demonstration given by v. Mises in the case of independent collectives if we 
introduce the results of paragraph 1 in order to estimate the variance. We shall 
consider here only one dimensional, bounded collectives in order to make clearer 
what is the essential of the generalization. 

A sequence of dependent collectives Pi(x), Pi{x), ■ ■ ■ , P^{x) can be given in 
the following manner. Let P(xi, , • • , a:„) be the probability that the result 

of the first observation is ^ , of the second ^ 2:2 , ■ • ■ , of the nth S x„ , 

This distribution will be said to be bounded in (A, P) if P = 1 when all the x, 
are ^ B and P = 0 if at least one of these arguments is less than A. From this 
n-dimensional distribution we deduce n one dimensional distributions 


( 1 ) 


Pi(x) = P(x, B, ■■■ ,B), 


Pi(x) = P(B, X, B, ■ ■ ■ , B), - ■ , P„(x) = P{B, ■■ ,B,x) 


where P,ix) is the probability that the vth observation be g z The P,{x) are 
uniformly bounded in (A, B) which is a consequence of P(xi, I 2 , • • •, Xn) having 
been assumed to be bounded in this interval. In an analogous way we deduce 
from P(a:i, , • • • , Zn) the in(n — 1) uniformly bounded two dimensional 

distributions 


(2) Pi 2 (z, y) = P{x, y,B, B), Pii{x, y) = P{x, B,y, B, ■ ■ ■ B), ■ ■ ■ . 


Here P^y{x, y) is the probability that the juth result is the rth result ^y, 
and we have P^,{x, y) = P,„(j/, x). Of course we have also 

(T) Pi(x) = Pk(x, B) = P„(x, B) = ... = Pi„(x, B) 

P 2 (x) = Pij(B, x) = Pliix, B) = .. = Pin{x, B) etc. 

If we put in (2) z = y we obtain P„,(z, z) = P,„(x, x) and we introduce 


(3) 


P„,(x, x) = P,i.(x) = P^,{x) 


the probability that both the yth and the vth observation is gx. Then P^,{x) 
equals zero if z < A and equals one if x ^ B, and this is valid with the same A 
and B for aU the distributions P,^{x). 

Now if Pi, P 2 , • ■ • , are the probabilities of success for n general alterna¬ 
tives Tchebycheflf's Lemma asserts that the probability W that the average 
^ X 2 + ■ ■ ■ + x„)/n oi n observations differs by more than ri from its expecta¬ 
tion (pi -h P 2 + ■ ■ • + Vn)/n is subject to the following inequality 


(4) 


W ^ -.Var 
T 




-t- X2 + 


+ iCfi ’ 


2 


Here si is given by (1) of paragraph 1. 
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Let US introduce the average of the P,ix ): 

(5) Kix) = [Pi(a:) + P,{x) + • ■ ■ + Pn(x)]/n 

and let Q„ be the probability that at any point of continuity of P„{x) the in¬ 
equality 

(6) ■ 1 Sn{x) - Pnix) I > 1? 

holds. Our aim will be to show that for every under certain restrictions re¬ 
garding the given collectives, Qn tends toward zero as n tends toward infinity, 
For a fixed point x' the probabilities Pyix) = p, and Ptiy{x) = p^y are constants 
and we put P„(a:) = Pn = (Pi + Pa + • Po)/n. The probability that in x' 

(7) I Sn(*') - Pn(xO I > v/2 

is then, according to (4) smaller than (Sft)x'/( 5 »?) V. Here we denote by (s* )i» 
the value of s* in x' (as given by ( 1 ) m paragraph 1 ). 

Now we divide the interval (A, B) in N parts in such a way that in every one 
of the N intervals e.g. in (x', x") the variation 

(8) S = W) - P„(x') ^ v/2. 

If there is at x' (or at x") a step of Pnix) we take the limit which Pn(x) approaches 
as X x' (or,x") from the interior of the interval. In order to obtain such a 
division we need only divide the total variation 1 of Pn(x) in 2 /ij equal parts and 
project these points of division on Pn(x), disposing however in a suitable way of 
horizontal parts of Pn(x). The abscissae of these points form the endpoints 
of the N intervals. If there is a step of Pn(x) at an endpoint of one of these 
intervals the vanation in both the adjacent intervals can only be diminished 
It is further possible that the two ends of an interval coincide x' = x", this will 
be so if Pn(x) has for x' a step > 7 )/ 2 . In any case we have a division in N g 2 /?) 
intervals such that all the pomts of continuity of Pn(x) are enclosed in them and 
in each of these intervals ( 8 ) is valid. 

Let us now assume that in the left end point x' of the rth interval (x', x") the 
inequality 

(9) I -S„(xO - P„(xO I g „/2 

is valid. Then we have for every x between x’ and x" 

( 10 ) I Sn{x) — Pn(x) I £ 77/2 4- 5 ^ 

Because, since Sn{x) apd P„(x) are both monotone, the difference )S„(xO — 
x') cannot increase by more than d g 17/2 as x varies from x' to x". There¬ 
fore if ( 6 ) is valid for any point x in this interval then (7) must be valid for 
the left end point x' of this interval and the probability qy of this latter inequality 
is less than or equal to 4:{sn)x>/n'n. 

But there are N intervals with the left endpoints x(, Xj, • • • x(r and the 
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probability that (6) may be valid in. any point belonging to any one of these 
intervals is^g g-i + 92 + • • • + . Denoting by si the greatest of the N 

variances (sn)*i > (®n )*2 > ''' 1 (^n)x// we have for Qn (which is the probability that 
(6) may be valid at any point of continuity of P(x)) the inequality 

(11) Qn ^ gi + g2 + • • • + gjir ^ ^ 

Therefore Qn tends toward zero for every rj if s„/n tends toward zero. 

But according to (2) in paragraph 1, s„/n tends toward zero if for every x in 
(^ B) 

(12) 4 i - -P.toP.Ca:)] = 0. 

n-*M n 1 

Considering the definition of continuity of a statistical function we have ob¬ 
tained the following result: 

As in (1'), (2), (3) and (5) let Pp,(x, y) be two dimensional distributions {y, v = 
1, ■ ■ ■ , n; v), uniformly bounded in (A, B); P^yix, B) = P^ix)] P^,(x, x) = 
P^yix) and Ty{x) = l/v(Pi(x) + Piix) + • • + Pyi^e)). 

If the variable partition Sn(x) is bounded in (A, B) and if /(yS„(x)} is uni¬ 
formly continuous at the “points" Pi(x), Ti{x), ■ • • then the probability that 

(13) \f{Sn{x)} -f{Kix)]\> s 

tends toward zero for every « os n —> «>, provided ( 12 ) is unifoMy valid for every 
X in (A, B). 

4. Examples. Let us illustrate by simple examples. 

1) In order to define the Py(x) etc. mentioned in our theorem we define the 
n-dimensional distribution P(xi ,xi, • • • x„) used at the beginning of paragraph 
3 by indicating the probability density 

y(xi, X 2 , • • • , Xn) = C„[l - xixt ■■■ x„] in the “unit cube”, 

^ ^ =0 , elsewhere. 

The corresponding probability distribution is 

(2) P(xi, Xs, • • ■ , X„) = / • ■ ■ / >*( 2 : 1 , ® 2 , ••• I Xn) dXl ■ - • dXn. 

J—oe 

By putting 

(3) O'" ~ —A’ 

we see that P(xi , X 2 , • • ■ , Xn) equals unity if all the arguments are S 1 and it 
equals zero if one of these arguments is less than 0. Therefore P(xi , xj , • • • , 
x„) is bounded in the unit cube. 
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(4) 


From (1) -we deduce the two-dimensional densities 

y) = in the unit square, 

= 0 elsewhere 

and the distributions 

(6) P^,(x, y) = [ f y) dx dy. 

We see that 


Piip(x, y) = GnXy ^1 — in the unit square 

= 0 ifiorj/gO 

= 1 if rs and y ^ 1 

and e.g. for a: ^ 1, 0 < 2 / < 1 we have P^,{x, y) = P„,(l, y) etc. Thus the 
Pf,{x, y) are completely given. 

It follows from (3) that — C„/2'* => 1 —• Cn \ therefore putting Cn = C* we 
have in (0, 1) 


P,,(x, x) = P„{x) = Cx^ -h (1 - C)x^ 

P,{x) = C® -f- (1 - C)x* 

P,,{x) - P,{x)P,{i) = (7(1 - C)x\l - xf 

is < 0 for every x in (0, 1) since (7 > 1. For a: g 0, P^y{x) and P,(a;) both 
equal zero and for a: ^ 1 they both equal 1. Therefore our conditions of para¬ 
graph 1 are fulfilled. We see that (7„ tends towards imity as n —^ oo, therefore 
for every x in (0, 1) P^,{z) —■ P^{x)P,(x) tends towards zero, we have "conver¬ 
gence towards independence” but by no means independence. 

This example was based on a symmetric density. Let us give an example of 
asymmetric and arithmetic distributions. For the sake of simplicity let Pi(x), 
PsCx), • • • be arithmetic distributions each with only three steps at x = 0, 1 
and 2. As starting point we take the n-dimensional arithmetic distribution 
v{x \, x%, • • ■ x„) which gives the probability that the first result equals xi, the 
second Xa, • ■ ■ , the nth x„ , the x, being equal to 0 or 1 or 2; thus t)(xi, xj, • • • , 
x») takes 3" values the sum of which equals unity We deduce the two dimen¬ 
sional distributions Vf,y{x, y), e.g. ru(x, y) = 2^ v{x, y,Xi, • - ■, x„), the prob- 

ability that the first result equals x, the second y, and finally the fiCx) = 
2 ^n{x, y), etc. According to the definitions of P,{x) and P^,{x) we have then: 


( 6 ) 

therefore 

(7) 
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(8) Pyix) — 0 (s < 0) 

= «»(0) (0 g a; < 1) 

- = «.(0) + «,(1) (1 g X < 2) 

= 1 (2 ^ x), 

(9) P^,(x) =0 (i < 0) 

= ‘^^►(0, 0) (0 ^ X < 1) 


= y,„(00) + v(10) + v„{0l) + v(ll) (1 ^ X < 2) 

= 1 (2 ^ x). 

Now we subject t)(xi, ■ • ■, x„) to the followmg conditions: Every «(xi, • • •, x„) 
equals zero if it contains either: at least two “zeros,” or: at least one “zero” 
and one “one,” or: at least two “ones.” All the other r-values are supposed 
to be different from zero. Then we have 

t)/i»(0, 0) = VitrO-i 0) — IVcCO) 0 ~ 1) ~ 0 

therefore P^,{x) = 0 for x < 2 and P„>(x) = 1 for x ^ 2. On the other hand 
Vr(0) = «(2, 2, • • • 2, 0, 2, .. 2) and ti^l) = 1 ^( 2 , 2, • • • 2, 1, 2, . -2) there¬ 
fore P»(x) 7 ^ 0 for 0 § X < 2 and we have thus for every finite n 

P^,{x) — P^(x)P,(x) =0 for X < 0 and x ^ 2, 

<0 for 0 ^ X < 2. 

Therefore the condition {h) of paragraph 1 is fulfilled and thus (12) paragraph 3 
holds. 

I hope to have the opportunity to discuss more general applications of this 
theorem later. 

A generalization of the strong law of large numbers may be given in a simi¬ 
lar way. 
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CONDITIONS FOR UNIQUENESS IN THE PROBLEM OF MOMENTS 

By M. G. Kendall 


It waa ahown by Stieltjes [1] that in some circumstances it is possible for two 
different frequency distributions to have the same set of moments. For in¬ 
stance, the integral 


around a contour consisting of the positive ai-axis, the infinite quadrant and 
the positive i/-axia is seen to be zero and it follows that 

f x” sin dx = 0. 

Jo 

Thus the frequency distribution 


( 1 ) 


dF = — X am x*) dx 


0 < X < 00, 

0 <X < 1 


has moments which are independent of X, and equation (1) may be regarded as 
defining a whole family of distributions each of which has the same moments. 
It is easy to see that momenta of all orders exist, and in fact 

fir (about the origin) = Kdr -|- 3)!. 


A second example of the same kind, also due to Stieltjes, is the distribution 

/ . 1 , 0 < X < «, 

(2) dF = ~ x-^"* * {1 - X sin (27r log x)) dx 

eVir 0 < X < 1, 

for which 

fi' = 

The question naturally arises, what are the conditions under which a given 
set of moments determines a frequency distribution uniquely? The question 
is of great interest to mathematicians, being closely linked with problems in the 
theory of asymptotic series, continued fractions and quasi-analytic functions; 
and it also has importance for statisticians since there is sometimes occasion to 
be satisfied that a problem of finding a frequency distribution has been uniquely 
solved by the ascertainment of its moments or semi-invariants. Stieltjes him¬ 
self considered a more general problem: given a set of constants Co, 

402 
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Cl, ■ • Cr, • • • does there exist a function F, non-decreasing and possessing an 
infinite number of points of increase, such that 

(3) f x'dF = c, 

Jo 

and under what conditions is F unique, except for an additive constant? 
Stieltjes showed that if we express the aeries 

(4) Z (-1)'-; 

as a continued fraction of the form 


V"/ I I i” - r~ * • • " " ' ' ■ • • ■ 

ai2 -b 02 -f- OaZ -f- Ol + OSn-lZ + 02n + 

it is a necessary and sufficient condition for the existence of at least one F that 
all the a’s be positive; and that the function is unique or not accordmg as the 

40 

series Z (or) diverges or converges. (If the a’s are positive it must do one or 

r“0 

the other.) The integral of equation (3) is to be interpreted in the general 
Stieltjes sense, so that the result applies to discontinuous as well as to continuous 
distributions. This is also true of the results obtained below. 

Hamburger [2] discussed the similar problem when the limits of the integral 
in equation (3) are ± ■», and showpd that a function F exists if the expression 
of (4) as a continued fraction of the form 

bo bi hi 

00 + 2+ 01 + 2+ 02 + «!+ , 

gives positive values of the 6 ’b. In order that F may be unique it is necessary 
and sufficient that the continued fraction be completely (voUstandig) convergent 
in the sense defined by Hamburger. 

Unfortunately these criteria, though mathematically complete, are not very 
useful to statisticians because as a rule it is too difficult to express the coefficients 
a and b explicitly enough in terms of the given c’a to enable questions of sign or 
of convergence to be decided. So far as I know, no more convenient criterion 
for the general Stieltjes problem has been found; but progress is possible if one 
considers the narrower question; given a set of moments, is the distribution 
which furnished them unique, that is to say, can any other distribution have 
furnished them? This is more limited than the Stieltjes problem because we 
know that at least one solution exists. 

Contributions to this subject have been made by L6vy [3] and Carleman [4]. 
L6vy shows that if moments of all orders exist and are positive it is a sufficient 
condition for them to determine a distribution uniquely that remains 

finite as n tends to infinity. (Here and elsewhere in this paper /Xr refers to the 
moment of order r about any point, not necessarily the mean.) Carleman shows 



404 


M. G. KENDALL 


that, for the case of limits — oo to + <« the moments determine the distribution 
uniquely if 

r-O 

diverges. For the limits 0 to «> he gives the corresponding series 

00 -t 

V_1_ 

a criterion which can be improved upon, as will be shown below. 

The purpose of this paper is to develop criteria of this kind more systematically 
and to give more general criteria suitable in cases where the moments are not 
known explicitly but the behavior of the frequency distribution at its terminals 
is known. 

Three preliminary points necessary for the later argument may be noted. 
(1) Define the absolute moment of order r by 



and recall that 

>'1 < rl < • • • < v)’' < • • • 

(cf. Hardy and others, [5]). In other words the quantities v).'’' form an increas¬ 
ing positive sequence and their reciprocals a decreasing positive sequence. 

(2) The quantity must either tend to a limit or diverge to infinity as 
n —» «. For suppose that 

lim v^n^Jn = k, 
lim = 1. 

Writing temporarily , we have that, given e there is an W such that 

ajn > k ~ f 

for an infinity of values of n greater than N, Similarly there is an M such that 

a„/n < I + t 

for an infinity of values of n greater than M. Now choose p such that , Op+i 
are two consecutive values, one near the upper limit and one near the lower 
limit. This can always be done and we can take p as large as we please. We 
then have 

Op > p{k — e) 

< (p +. 1)(Z + *) 
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and hence, since > Op 


(fc — «)p < (p + 1)(I + <) 

giving 

(A: — Z)< —1-2« + -. 

P P 


Thus k -- I can be made as small as we please and is thus zero. 

The argument can be very simply adapted to the case in which k is infinite, 
and if I is not finite k, being not less than I, is infinite. Thus as n —» « either 
lim ajn exists or a„/n —> 

(3) If any moment fails to converge, so will all moments of higher order. It 
is evident that more than one distribution can exist having a limited number 
of finite moments given and the remainder infinite. Thus we need only consider 
the case when moments of all orders exist. Furthermore, if any even moment 


exists the absolute moment of next lowest order must exist; for if 



x’^dF 


*0 

exists, then each of / dF and I a*" dF exist separately, each being positive. 
Hence / x^"~^dF and / exist separately and thus / = 

~L J * dF exists. Hence we need only consider the case in 

which absolute moments of all orders exist. 

Theorem 1. ^4 se< of moments determines a distribution uniquely if the series 


converges for some real non-zero t. 

r-Q rl 

Consider the characteristic function 



e‘*‘dF. 


This is uniformly continuous in t, and so are its derivatives of all orders. Thus 
we have, in the neighborhood of t = 0 the Maclaurin expansion 





+ R 


_ ^ jii)' 

rl 


Pr + R- 


* This proof is neoesBary to the use of limits in the following theorems, but Theorems 2 
and 3 are equally valid if lim is substituted for lim therein. It is not generally true that 
if dp and b« are increasing monotonio sequences either lim On/tn exists or On/bn —> * aa 
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Consequently, under the condition of the theorem, which implies that S 


rl 


4r 


is absolutely convergent for some radius p, has a Taylor expansion in the 
neighborhood of the origin and is thus uniquely determined by the moments for 
t < p. Furthermore, in the neighborhood of t = io we have 

^(t) = + R. 


(I _ kv 

The modulus of the coefficient of - j—is not greater than . Therefore </>(<) 

can be expanded m the neighborhood of < = <o in a Taylor series with a radius 
of convergence at least equal to p. Hence the function defining in the 
neighborhood of the origin can be continued analytically throughout the range 
— 00 to + and if>(f) is uniquely determined in that range. 

But the characteristic function unqiuely determines the distribution; and 
hence the theorem follows. 

As a result of Theorem 1 we have the following generalization of the criterion 
given by L4vy. 

Theorem 2. A set of moments completely determines a distribution if lim v]! " jn 

n->ae 

IS finite. 

It has already been seen that unless Vn^/n becomes mfinite the limit exists. 

V ^ 

By the Cauchy test for convergence the series S converges if 

rl 


(7) 



< 1 . 


As n—> 00 , (n^y'" tends, in accordance with Stirling’s theorem, 
(\/2irn e~’'n’'y'’' i.e. to n/e. Consequently the condition (7) becomes 

lim {v^y’'fn\ et <1. 


to 


Thus if lim = k, say, the inequality (7) is satisfied for t < l/ifik) and the 
theorem folloiys. 

An important corollary, which enables us to disregard the absolute moments 
(which may not be given if part of the range is negative) is 
Theorem 3. A set of moments uniquely determines a distribution if 
lim Jn IS finite. 


For 


l/(2»- 

»'Stv-1 


l/(2«) _ inin) 

"" = M2n 


S = 


Thus, 


lim - 


1 


2n - 1 


,l/(Sn-l) 

'Vln-l 


< lim 


1 i/(j„) 

2n - l'2n^’‘'' 
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and is therefore finite if the limit on the right is finite. Thus lim vi"In, which 
cannot be greater than the greater of the two limits of - 1) and 

)'2»“"V(2n), must be finite; and the theorem follows from Theorem 2. 

Now consider the series X) ~i7;. Since the successive terms form a monotonic 

r=(l Vf 

sequence it is a sufficient as well as a necessary condition for convergence that 
n/ri/” tend to zero. Thus, if the series is divergent ! cannot tend to zero 
'and so jn cannot become infinite. Hence it must tend to a finite limit,which 
may in particular be zero. Hence from Theorem 3 we get 
Theorem 4, A frequency distribution is uniquely determined by its moments if 
* 1 

X -Yfr diverges 

r—0 

Since 1/vl'’^ is a decreasing sequence the series S 1/vl'' converges or diverges 
with 2 The Carleman criterion, given by him for the case of limits 

rh °°, follows. For the case of limits 0 to “ the absolute moments are the same 
as the moments and the criterion can be the divergence of either S or 
2 Since y.r is greater than unity m the type of case under consideration 

the former series provides a more stnngent test than that given by Carleman. 

At first sight it is rather surpnsing that the uniqueness of the distribution 
depends only on the beha'vior of the even moments, particularly when, hy a 
simple extension of the above result, it is seen that a sufficient condition for 
uniqueness is the divergence of 2 l//t}n^"’ or 2 or any infinite subset 

chosen from the moments It will, however, be remembered that the odd 
moments are conditioned to some extent by the even moments, and that unique¬ 
ness is really determined by the limiting form of Vn as n tends to infinity. 

It is eiddent that other tests may be derived from Theorem 1 by using the 
various tests for the convergence of an infinite series. For instance it is a suffi¬ 
cient condition for a set of momenta to determine uniquely a distrihution with 
positive range that 

— 1 _L ^ _1_ ri( ^ \ wtiprp ^ ^ 

n\/ (^TTl)! " + n + ^VW’ ^ > 0 

i.e. that 

It may be noted in pasamg that the distribution 

dF = e~^dx 0 < z < «, 


for which 


fir (about origin) = rl 

is completely determined by its moments. In fact, by direct reference to 
Theorem 1 we see that the series 2 (liY converges for f < 1. 
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A frequency distribution of finite range is imiquely determined by its moments. 
For if the range is 0 to A we have 

Hr= x^dF < A' 

Jq 

and hence 1/til'' > 1/A so that the series S 1/ti/J'^ is divergent. 

A proof for the c^e when the frequency distribution is continuous has been 
given by L4vy, though on entirely different lines from the above. 

Theoeem 6. A frequency distribution of infinite range is uniquely determined 
by its moments if it tends to zero at the infinite terminals faster than e”". 

Consider first of all the case when only one end of the range is infinite, so 
that we may take the range to be 0 to «>. 

If {ftn/nl)^'’" has a finite limit the distribution is unique, by Theorem 2. We 
have then only to consider the cases (if any) in which tends to infinity. 

It will be shown that in fact such cases do not occur. 

Given any (small) «there exists an X such that 



x>X 


where f(x) is the distribution. Thils 
(9) f f(x}x’'dx < f 

This is true for all n and X is independent of n. Now, 

f f(x)x”dx= f f(x)x’'dx+ f f(x)x’'dx. 

Jo Jo ' Jx 

The first integral on the right is not greater than X”. The integral on the left 
tends, for large n, to something of greater order than nl, by our hypothesis, and 
hence to something of greater order than n". This is of greater order than X" 
(since X, however large, is independent of n) and consequently the second in¬ 
tegral on the right is also of greater order than nl. But this is contrary to 
equation (9). 

The case for the range which is infinite in both directions may be dealt with 
similarly. 

It is easily seen that the two examples of equations (1) and (2) do not tend 
to infinity faster than e~®. 

Except for the general result of Stieltjes, all the above criteria provide suffi¬ 
cient conditions, but whether the condition of Theorem 1 is also necessary is 
not certain. An inquiry into the circumstances in which the moment-series 
of Theorem 1 does not converge throws some light on the question. 

It will be remembered that the characteristic function always exists and is 
uniformly continuous in t. Since the moments of all orders are assumed to exist 
we always have 


/; 


e~^x’'dx < enl. 
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Mr. 


.di’' j(_o 

Thus, if 0(0 can be expanded in an infinite Taylor series that series must be 

fi,. And if this series does not converge then 0(0 cannot be expanded 
r! 

as an infinite Taylor series. But it can always be expanded in the finite form 
with remainder 

0(0 = S ^ /Ir + R- 

r-O r! 


Thus, when the series does not converge, 0(0 can he expanded in powers of t 
only asymptotically 

Now it is known that there exist an infinite number of functions which have 
a given set of coefficients in an asymptotic expansion; for instance, if 0(0 has 
an asymptotic expansion in i the functions 0(0 + Xr‘°'' ‘ all have the same 
expansion. It is therefore hardly surprising that when the conditions of 
Theorem 1 break down there can be more than one frequency distribution with 
the same set of moments 

But it does not follow from what has been said'that there must be more 
than one frequency distribution. There must be more than one function, but 
those functions may not qualify as frequency distributions, e.g. they may be 
negative in part of the range. In the example just given r'“* ‘ cannot be a 
characteristic function, for it does not obey the well-known condition that 0(0 
and0(—0 should be conjugate. 

However, the question is more of mathematical than of statistical interest 
since the criteria provided above are likely to be adequate for the distributions 
encountered in practice. For example they establish the uniqueness of the Pear¬ 
son curves (including the normal curve), the Poisson and the binomial. It 
would seem that distributions like those of equations (1) and (2) will appear 
only as statistical euriosities. 
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ON SAMPLES FROM A NORMAL BIVARIATE POPULATION 

Bt C. T. Hstj 

1. Introduction. In a number of papers wntten during the last ten years, 
J. Neyman and E. S. Pearson' have discussed certain general principles under¬ 
lying the choice of tests of statistical hypotheses. They have suggested that 
any formal treatment of the subject requires in the first place the specification 
of {i) the hypothesis to be tested, say Ho, (ii) the admissible alternative hy¬ 
potheses. An appropriate test will then consist of a rule to be applied to ob¬ 
servational data, for rejecting Ho in such a way that (tu) the risk of rejecting 
Ho when it is true is fixed at some desired value (e.g., 0.05 or 0.01), {iv) the risk 
of failmg to reject Ho when some one of the admissible alternatives is true is 
kept as small as possible. With these general principles in mind, they have 
investigated how beat the condition (iv) may be satisfied in different classes of 
problems. In many cases, though not in all, it has been found that the condi¬ 
tions are satisfied by the test obtained from the use of what has been termed 
the likelihood ratio, [9], [10], [14]. Once the problem has been specified, the 
test criterion is usually very easily found, although its sampling distribution, 
if Ho IS true, often presents great difficulties. In the present paper, I propose 
to use this method to obtain appropriate tests for a number of hypotheses con¬ 
cerning two normally correlated variables. The investigation was suggested 
by a recent application of the method by W. A. Morgan [6] to a problem origin¬ 
ally discussed by D. J. Finney [3]. 

2. The hypotheses and the appropriate criteria. A sample of two variables 
Xi and Xj is supposed to have been drawn at random from' a normal bivariate 
population, with the distribution 

where fi, , <ri, o-*, and pu are the population parameters. 

Morgan tested the hypothesis that the variances of the two variables are 
equal, i.e., 

Hi: Cl = C2. 


* See bibliography at the end of the paper. 
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Other hypotheses that will be considered in the present paper are as follows: 

Hi: Assuming ci = vj ; to test pn = po. 

Hi : Assuming vi = da ; to test ^1 = ^ 2 . 

Hi : To test simultaneously o-i = < 12 , pu = po. 

Hb ; To test simultaneously o-i = va, = ^2 

Ht ■ Assuming o-i = ai and ^ 1 =^ 2 ; to test pn = po . 

H^ : Assuming ai ~ at, and pa = po; to test = fa. 


Derivation of the criteria. Let Xu, x^, be the measurements of the two char¬ 
acters on the fth individual of the sample, then the joint elementary probability 
law of the two sets of n observations E = {xn, Xa, • • ■ , x^xn , Xa, ■ ■ ■ , 
xsn) is 






It will be convenient to denote by A, H, C, D, the following conditions of the 
population from which the sample is supposed to be drawn. 


(A) that stated in equation (1). 

(B) that stated in the equation for Hi, namely 

ffi = 0-2 = (r(<r being unspecified). 

(C) = ^2 = ^(f being unspecified). 

(D) P 12 = po. 


Neyman and Pearson’s method affords a simple rule for obtaining appropriate 
test cliteria once two sets of conditions have been defined. These are 

(а) the conditions which can be assumed to be satisfied in any case, and 

(б) the conditions which are satisfied if the hypothesis to be tested is true. 

The conditions (a) define a class 0 of admissible populations, and the condi¬ 
tions (f)) define a sub-class « of 0 to which the population must belong if the 
hypothesis tested be true. 

The maximum value of | Ji, & , ffi j <ri, P12) when the parameters vary in 
such a way that the population sampled always belong^ to 12, is called p(12 max.). 
The maximum value when the population is restricted to to is called p(co max.). 
The likelihood ratio for testing the hypothesis specifying the subset w has been 
defined to be 


X = 


p (co max.) 
p (12 max.)' 


(3) 
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It will be seen that 1 < X < 0. By referring X, or a monotonic function of X 
to its sampling distribution when the hypothesis tested is true, we obtain a 
scale on which to assess our judgment of the truth of the hypothesis tested. 

For each of the hypotheses Hi to Ht of (3) can be found. However, we 
shall use a more convenient cnterion 


(4) 


L == X*'" 


which is a monotonic function of X. 

Thus the respective test criteria are found to be' 
For Hi : 

^sUUl - rl,) 

(si + sim - R\) 


St* 5 5 

where Ri = ■ „ ■ - I is the estimate of pu when ci and tr* are assumed to be equal. 


For Hi: 

( 6 ) 

For Hi: 

(7) 

For Hi: 

( 8 ) 

ForHs: 

( 9 ) 

For Hi : 

( 10 ) 


Si + Sa 


, _ (1 - pi) (I - rD 

* (1 - poRi)^ • 




1 + 


— Xi) 




Si + S2 — 2ri2SlS2, 


‘ (.si + sl)\l - piRi)^ '• 


Li = 


4s?sl(l - rl) 


{si + Sa + ~ ^?)*)(1 “ 722 ) 


— Li X I-2- 


, _(l-pl)(l-Rl) 
‘ (1 - p<^Ri)^ 


where % = 1 


Si + Sa + i(»i — Zi)^ 

the f’s are assumed to be equal. 
For Hj: 


is the estimate of pn when both the o-’s and 


(11) L,= l/I.l+ (1 + P°)(^i - ^ 2 )' 

' L 2(81 2 pofi 2 SiS 2 "i* i 
The different hypotheses are also given in Table V, at the end of this paper,, 


)^V 

sDJ 
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together with the conditions defining sets of fl and w, and the appropriate likeli¬ 
hood criteria. 

To complete the solution we must find the distributions of L or some mono- 
tonic function of L in each case when the hypothesis tested is true, in order to 
assess the significance of an observed value of L. 


3. The distributions of the criteria. In order to simplify the problem of 
finding the distributions of the criteria, consider the following transformation: 

= (X, - F.)/V2 

( 12 ) 

^ a:,. = (X. + r.)/V2. 


It is clear that in view of (1) X and Y will be two normally correlated variables. 
We shall denote this property by A' corresponding to A. The conditions B', 
C', D' corresponding to B, C, D respectively are as follows: 


S': 

PXY = 0, 

C: 

= Oj 

D'\ 

a a 

ffy — 70 ®' Jt 

whe^e 


(13) 

1 + 

70 = 


(when pxY = ,0) 


Thus we have the equivalent h 3 rpotheses Hi, Hi • • ■ Hi corresponding to 
Hi, Hi, ■ • • Hi . The likelihood ratios Li, Li ■ ■ ■ L'l may be determined in 
the same way as before, and, in view of the transformation (12), it will be 
seen that they are equal to Li, Li Li respectively. 

The testa of the hypotheses H[ , Hi, Hi are now seen to be well known. 

The teat of H[ : pxr = 0 is the test for significance of a correlation coefficient, 
and the criterion Li becomes 


(14) Li = Xff," = 1 — Tzr . 

This test has been dealt with by Morgan [6] and Pitman [15], and has been 
referred to above. 

The test of Hi \ a\/c\ = 70 when pxr — 0 can be treated as an eictension 
of Fisher's 2 -test [6], since 70 is specified. If we write 

^ /S* _ 1 -b _ 81 -p si -b 2ria8i8i 

^ ° si + 8l- 2Tii8i8i 


the test criterion Li of (6) may be written 


(16) 


Li = 


iu 

7o(l -b u/ 70 )’ 


It is well known that if Ha is true, then 
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(17) 


p{u) = 


1 


7oB[Kw - 1), 


it) + 7 .) 


—(n—1) 


and the test appropriate to //i and therefore of Hi is the associated s-test (z = 
^ log u/ 70 ) ■with degrees of freedom /i = /a = n — 1. It may be easily shown 
that the two values of u cutting off equal tail areas from the distribution p(u) 
will correspond to a single value of Li. 

The test of H's : = 0 when pxy = 0 is in the form of “Student’s” t test. 

If we write 


(18) 


= (Xi - Xif 

n 1 Sx s? 4" s| — 2 riaSiS 2 


it follows that the test criterion Li of (12) may be written 

(19) l..l/(l+^). 

But it is well known that if = 0) then 

(20) pit) - _ 1)] (1 + jT^^i) 

The 5% or 1% points of significance of t may be obtained from Fisher’s i-table 
[6] with degrees of freedom / = n — 1. 

The tests of Hi and H^. We infer from (14), (16) and (19) that Li is a func¬ 
tion of rxr , Li a function of Sr and Sy, and La a function of X and Sx ■ It is 
clear that if rxr is distributed independently of Sx and Sr , then Li and Li are 
independent, i.e., 


(21) p{Li , Li) - piLi)piLi) 

and that if rxr is distributed independently of X and Sx, then Li and L 3 are 
independent, i.e., 

(22) p(Li , Li) = p{Li)piLi). 

It IS known that X, Y are independent of Sx, Sr, rxr ; and in addition that 
rxr is distributed independently of Sx, Sy if pxr = 0. Therefore, if H'l is 
true, then the relations (21) and (22) hold. Hence, knowing p(Li) and piLa), 
a very simple transformation and integration gives p(Li), Similarly, the dis¬ 
tribution of Ls may be readily derived from those of Li and La . 

But from the distribution of rxr when pxr = 0, by transformation (14), the 
distribution of Li assuming Hi true is found to be 

If Hi is true, from (17), by transformation (16) we have 
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(24) - «''■ 

Again, if i ?3 is true, from (20), by transformation (19), we have 

(25) viU) = B[Kn - 1), i] * 

which is the same as the distribution of L 2 . Therefore by comparing (21) and 
(22) we see that the distribution of h when Hi is true will be exactly the same 
as that of Li when Hi is true, We shall therefore confine ourselves to the 
problem of obtaining the distribution of Li from those of Li and Lt. 

Now 

(26) - 1), 11 ^- 
Applying the transformation 

Li = Li Li 

Z = Li 

and integrating with respect to Z from 0 to 1, we obtain 
(28) viLi) = W - 2)Il‘’‘“^ 0 < L 4 < 1. 

Thus we can construct the values of Li at the 5% and 1% levels for different 
values of n as given in Table I. 




5 

6 

7 

8 
9 

10 

12 

15 

20 

24 

30 

40 

60 

120 


TABLE I 

5 % and 1% values of Li {or Li) 
6 % 


.1357 

.2509 

.3017 

.3684 

.4249 

.4729 

.5493 

.6307 

.7169 

.7616 

.8074 

.8541 

.9019 

.9505 

1.0000 


1 % 


,0464 

,1000 

.1585 

.2154 

.2683 

.3162 

.3981 

.4924 

.5995 

.6579 

.7197 

.7848 

.8532 

.9249 

1.0000 



416 


C. T. HSU 


The test of Ht. In the case of testing — to^x), assuming p^y and 

Px each to be zero, the likelihood estimate of cx becomes hX'^/n or >Sx + 

The distnbution of this quantity is the same as that of but with degrees of 
freedom n instead of ti — 1. Therefore, by analogy with the previous result 
(17) used in testing Hi , if we write 


(29) 


n^\ __ -S*r _ 1 + 222 
1-222 


then the likelihood criterion of 22a becomes 


(30) 


and 


(31) 

p(. |4-« 


i. = 


4i; 


(-i)‘ 


70 


(r ■'(-..) 




7oB[i(n — 1), in] 

Hence the teat appropriate to 22, is the associated z-test 2 = i log S ^^ 

m/ n 


with/i = n — 1, /a = n. We can use the z-table as before. 

The test of Hy. Here we test whether fx = 0. It may be seen that T? is 
a function of 7oSx)- Further, if we assume that pxr = 0 and also 


that (Tk = 7 o<Sx, then it will follow that X{X — X)® and - 2(7 — 7)* are each 

70 

distributed independently as xVx with n — 1 degrees of freedom; and hence 
their sum is distributed as x^<rx with 2n — 2 degrees of freedom. Also if {x = 0 
(and H'i is true) 'X will be distributed normally about zero with standard error 



Hence we may write 


(32) 

ir = 1 

/ + 2n - 2 } 

where 



(33) 

/» = 7 / i + 2(1^ - ?)V70 

' y n{2n — 2) 


and is distributed in accordance with “Student's” distribution with 2n 
degrees of freedom, 


(34) 


P(<2) 


1 


V2n - 2B[i i(2n - 2)] 


+ 2;^2) 


-l(Sn-l) 


2. 


In terms of original variables 


(36) 



7or 

70-si + -Si 


(I + po) {xi — ^2)^ 

2 (si — 2pori2 81S2 + si) 
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4. Comparison of the Ei-testand i22-test with the ns-test in cases where 
and Ho are true respectively. It will be noted that in the preceding discussion 
we have been concerned with three different tests of the hypothesis that pn 
has some specihed value po _ When there is no information available regarding 
the means and standard deviations of xi and Xj, the test is based on the sampling 
distribution of the ordinary product-moment coefficient rn . If it may be as¬ 
sumed that ffi = o-j, then we have the estimate 


Ri 


2ruSiS2 


s\ + si 


If besides ci = a -2 , it may also be assumed that fi = fj, then we have the 
estimate 

_ 2riiSiS2 ~ ^(^1 — XiY 
Si + 82 f(xi — X2)^ 


From the point of view of testing hypotheses, all these criteria , Ri, Et 
follow from the application of the likelihood ratio method It will be noted 
that if 0-1 = a- 2 , either the or the Ri teat may be used. But, insofar as the 
likelihood principle' is accepted, the latter should be regarded as the “better” 
test Again, if cti = <t 2 and fi = , all three tests may be used, but that based, 

bn R 2 will be the “beat”. A question of interest is to investigate just what is 
meant by the “better” or the “best” teat. We may ask how far the improve¬ 
ments are sufficient to jus'tify the use of the and R 2 tests in place of the more 
generally used ri 2 test. One method of comparison is to examine what Neyman 
and Pearson [12] have termed the “power function” of the tests. 

For example, when testing the hypothesis that a parameter 6 has the value 
60 in the population sampled, the power of the test criterion T with regard to 
the alternative hypothesis that 9 = is given by the expression /3(0i) = 

P{T > Ta\B = 9i) where T{ is the value of T at the level of significance a. 
This quantity /3(9) measures the chance that the test as specified will detect 
the fact that B = 60 , i.e., the chance of rejecting the hypothesis when it is not 
true A test whose power function is never less than that of any other test is 
termed the uniformly most powerful test. 

If the permissible alternative hypotheses to 9 = 9o are both 9 < 9o and 9 > 9o, 
then the power of the test T is given by the expression 

m = 1 - v{i'« <T< 

where Ta and T'l are the values of T at both ends of the distribution at the 
level of the significance a. When the test is such that the power function has 
a minimum value a at 9 = 9o, it is said to be unbiased. 

A test is termed biased if, for certain alternative hypotheses 9 9o, the chance 
of rejecting the hypothesis 9 = 9o is less than the chance of rejecting this hy 
pothesis when it is true. 
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In what follows it is proposed to compare the power functions of the testa 
based on , Ri, and R 2 in order to obtain more complete evidence of the 
extent to which one is “better” than the other. 

The distnhuiion of We have obtained the distribution of n when H'a and 
therefore Hi is true. We are now able to find the distribution of Ri by apply¬ 
ing the transformation of (15). Thus the distribution of Ri in terms of is 


(36) 


p(fiilpo) = 


(1 - pi) (1 - 

2"-2B[Kn - 1), l(n - 1)] (1 - pof2i)"-‘ 


The significance of Ri may be assessed by the g-test, where we take 
(37) 


y 1, U 1, 1+jBi 1, 1+PO 


= z' - f, say 


with degrees of freedom /i = /a = n — 1 E,. A Fisher’s z-table may be used 

in this connection. 

When pi 2 = 0 , the distribution simplifies to 


(38) 


p(I2i|pi2 = 0) - 


B[Kn 

1 


B[^(n — 1 ), i] 


1 ), ^{n - 1 )] 

(1 - 


(1 - 22 ?)*'""'” 


since 2 ^"“^ B[^(n — 1 ), ^(n — 1 )] is equal to — 1 ), 5 ] by duplication for¬ 
mula [16, p. 240]. 

The distnbution (38) is similar in form to that of pirnlpn = 0) with n — I 
degrees of freedom instead of n — 2 The significance levels of 22i may then 
be obtained directly from the r-table [ 1 ] for the case pw = 0 , entering with 
degrees of freedom w — 1 . 

The dislribukon of Ri. The distribution of Ri may be obtained from that of 
V when H'^ and therefore Ht is true. It is 


(39) piRi pii — po) 


(1 -t- po)*"(l - po)*'"'*’ (1 + fi!2)*'"-”(l - 

2'‘-*B[i(n - 1), M (1 - PoEj)"-^ 


This agrees with the result first obtained by R. A. Fisher [4] by a different 
method. The significance of Ri may be assessed by the z-test, where we take 


(40) 



^ Since finding the distribution of Ri (36), (38) and the relation between Ri and z' (37), 
my attention has been drawn to a recent paper by DeLury [2] in which the same resuUs ' 
are obtained Since my method of derivation is different from his, I have thought it 
worthwhile to retain it here. 
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with degrees of freedoin /i n 1| /a — n. The tables for use with the z-test 
may be used in this connection, 

When P 12 = 0, the distribution is simplified to 


(41) p(f2alpri -- 0) 


1 

2"-‘B[Kn - 1), in] 


(1 + 


which is simply a Pearson Type I curve. 

Power functions of Ri and Rj. In order to find the power functions of Ri and 
Eh with respect to alternative hypotheses Ht to specifying pn = pi < po, 
it will be convenient to consider the incomplete beta function distributions 


(42) 

(43) 


p(xi) = 


B[Kn - 1), Kn - 1)] 




pixt) = 


B[Kn - 1), in] 


4‘’’~“’(l-a^) 


i(n-2) 


where Xi = 


u 


and Xi = 


Prom the Tables of the In- 


7a(l + u/7o) ^ 7o(l + v/7o)' 

eompleie Beta Function (13] we can find the values of Xi and % at the significance 
level a, i.e. 


(44) 1,1 m - 1), K» -1)] - 

(46) /.j (Kn - 1), W = 

The values of Ri(a), and of Ri(a), may then be calculated from the relations 


(46) 


•u — 1 _ — 1 + ypxi 

W + 1 1 — Xi + 7oXi ’ 


(47) 


p — 1 _ —1 + gj + 7o^a 
11+1 1 — Xa + 7oXa 


The power functions of Ri and Rt thus found may be given as follows: 

(48) I Ri) = P(Ri < Ri(cc) I Pt], 

(49) I Ri) = P\Ri < Rii«) I Pt). 

In the same way, for any alternative hypothesis Bi specifying pn = Pt^ Pv > 
we can find the values of Xi and xj at the significance level a", at the other end 
of the distribution, i.e. 

(60) 1 — 1,1' (K^ " 1)) i(” ~ ’ 

(61) = 

Thence the corresponding values of f2i^(«) and Rt (a) may be obtained, and their 
power functions are 


(62) 


^"{pt I Ri) = > ■Ki(«) I 
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(53) /3"(p, I R 2 ) = P{Ri > RUc^) I P<}. 

The power functions of Ri and Ri with respect to alternative hypotheses speci¬ 
fying Pis = Pi < Pa and > po may now be obtained by adding (48) and (52) or 
(49) and (53) or, more simply, 

(54) ^(pi 1 fli) = 1 - P(/i:l(a) <Ri< Riia) \ Pt}, 

(55) /3(p, 1 Ps) = 1 - P{R2(a) <R, < Riia) I p,} 

where R[{a), Ri(a)-, R^ia), Riia) are the values of Ri and R 2 at the two ends 
of the distribution at the significance level a = a' + a". 

In view of the fact that after transformation the tests based on Ri and TJs 
are equivalent to testa regarding the equality of variances, it follows from Ney- 
man and Pearson’s work [11] regarding the uniformly most powerful test of the 
hypothesis that ay/ax = 70 , with alternatives (i%la\ = < To (or 74 > 70 ), 

that: (1) if pi = 0 's and alternative to pis = ao are that pis = Pi < po (or, in a 
second case, pi > po) the test based on Rt is the uniformly most powerful test, 
i.e., it is more powerful than that based on ns; and ( 2 ) if oi = vs and ^i = Js, 
then the test baaed oil Ri is the uniformly most powerful test, i e., it is more 
powerful than those baaed on either ns or Ri. 

For illustration, let us take a special case, say 


(a) n = 10, Po = 0.6, 

From the tables, we obtain the values 
xi = .198902 
Zx = .801098 

and by calculation the values 

R[{ol) = -.0034 
R"{a) = .8831 


a! = a" = 0.025. 

Xj = .184863 
x" = .772916 

i?;(a) = -.0487 
Rt[.a) = .8632. 


The values of the power functions of R\ and Rt for specified values of pt have 
been calculated and are given in Table II. For p* < po, a comparison of 
columns 2 and 4 will show that the test based on Rt is uniformly more powerful 
than that baaed on Ri (or for pi > po, a comparison of columns 3 and 5). 

The unbiased test of Ht and Ho. When however the alternatives are that 
pit = Pt < Po , and Pt > Po , questions of bias may be introduced. 

In the case of Ht, i.e. when Ri is used, it was established by J. Neyman in 
his lecture courses [ 8 ], that if we test whether a\/ax = 70 , where the alternatives 
are 7, < 70 and 7i > To, and if the samples of X and Y are of equal size, then 
the test based on cutting off equal tail areas of the distribution of Xi is unbiased 
and of the type B [7]. Therefore the same may be said of the i?rtest. 

In the case of H%, the equivalent transformed test is again whether a\/<r\ = 
To. But the teat now corresponds to that in which an estimate of o-y is based 
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Q 0 J’j = 71 ' 1 degrees of freedom and an estimate of on Ji — n degrees of 
freedom. The degrees of freedom not being equal, it is known that if equal 
tail areas are cut off from the sampling distribution of , this test will be 
biased. Neyman’s result [ 8 ] shows that if the lower and upper significance 
levels are taken at ij and aij, then the equation 

(56) - a:*V* = 

should be satisfied if the test is unbiased. Since in the present case, with the 
test based on equal tail area critical region, the bias will be very small, the 
rejection levels Ri{a) and Ri (a) in the numerical investigation given in Table 
III have been selected taking equal tail areas for simplicity. 


TABLE II 


Values of the power functions of Ri and R% with respect to alternative hypotheses 

pn — PI po or pt > pH 


(71 = 10; PO = 0.6; a' = a" = 0.025) 


Pi 

P‘(pi\Ri) 

/i»(e.i«i) 


/s'fPilsO 

- 0.8 

.9984 




- 0.6 

.9739 


.9807 


-0.4 

.9867 


.9006 


- 0.2 

.7189 


.7360 


0.0 

.4960 

.0002 

.5093 

.0001 

0.2 

.2744 

.0008 

.2809 

.0006 

0.3 

.1825 

1 .0018 

.1860 

.0015 

0.4 

.1106 1 

.0042 

.1111 

.0037 

0.6 

.0576 

.0099 

.0580 

.0093 

0.6 

.026 

.026 

.025 

.026 

0.7 

.0081 

.0678 

.0080 

.0720 

' 0.8 

.0016 

.1995 

.0015 

.2160 

0.9 

.0001 

.6960 

.0001 

.6289 

0.96 


.8979 


.9150 

0.976 


.9866 


.9897 


If we now take a special case, similar to (o) above, but taking equal tail areas, 
so that 


n ™ 10 p “ 0.6 
a = 0.6 (a' * «" = 4“) 


we can obtain the values of x’b and of R’b as before. 

The values of the power functions of J?i and Ri for specified values of p, are 
given in columns 3 and 4 of Table III. These values are equivalent to the 
sums of the corresponding values in Table II. The values o , 
tions of El and E, for the following additional cases are also given m lable ill. 
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(5) 

n = 

10 

Po = 0.8 

a - 0.05 

(c) 

n = 

20 

Po = 0.6 

q: = 0.05 

id) 

n = 

20 

Po = 0.8 

OL = 0.05, 


Comparison of the power functions. We may now deal with the question 
raised at the beginning of this section, namely, as to what is meant by the 
"better” or “best” test. We shall proceed to compare for certain special cases 
the power functions of the three test, all of which are applicable where it may 
be assumed that ai = ffi, = ^ 2 . 

In the first place it will be noted that the power function of the test based on 


equal tail 

areas of the rn distribution is 

( 57 ) 

^{pt 1 rn) = 1 - P (7(2(0!) < 712 < 7(2(0:) 1 p,} 

where 

Fjrw < r(2(ai) 1 po) = p(rj2 (p^ = po) dr 12 = 

(68) 


P\rn> r'nia) 1 Po) = /,, p{rn \ Pn = po) dpvi = 

•’’•ijCa) 

and 



(59) 


P(j"l 2 I Pl 2 = Po) = 


(1 - COB-1 (-pon2) 

7rr[§(n - 1)] \drii/ \/(l - Po^h) ' 


The probability that is less than some specified value may be obtained from 
Tables of the Correlation Coefficient (F. N. David, [1]), or, where these are not 
sufficiently detailed, by using R A. Fisher’s a'-transformation for r^ [4]. 

The cases considered are (a), (6), (c), (d) as defined above. The power 
functions of the three different tests (all based upon the equal tail areas of their 
distributions) are given in Table III. The figures for ru in the brackets are 
those obtained by the ^'-transformation approximation. 

An examination of Tables II and III brings out the following points; 

(1) For reasons given above, the Rt test based on equal tail area critical 
regions is very slightly biased; the amount of this bias for the case n = 10, 
Po = 0.6, oi = 0 05 is shown in Table IV. This shows that the power of the R^ 
test is less than 0.05 in the fifth or sixth decimal places for 0.59 < pt < 0.60. 
As a result this test is very slightly less powerful than the other two tests for 
alternatives with pi slightly less than po. The effect is, however, of little im¬ 
portance. 

(2) Except in this short range of pt, we find that 


/3(pt I Ri) > fi(pi I Ri) > p(pt I fw). 



TABLE III 

-^pamon of tho fuodiooo of r„, ii.. aod «■ teoU mih resp^ to aUem,tm)w^ 
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That is to say, the power function of the Ri test never lies below those of the 
Ri and tests, and that of the iZi test never lies below that of the r-a test. 

(3) The gam in sensitivity as measured by the chance that the test will 
detect that pt 9^ po is, however, very small. Further, Ri may only be used if 
it is known that ci = trs and R^ if it is known in addition that £i = fa. It will 
only be in rather special problems that the statistician can feel confident that 
such assumptions are justified, We wilt therefore probably prefer the test baaed 
on the ordinary product moment correlation coefficient ra , since the slight loss 
in power will be felt to be outweighed by the gain in simplicity. It is, however, 
only after an objective comparison of the consequences of applying the three 
tests that a definite opinion on these points can be reached. 


TABLE IV 


' pi 




0.5 


.0093 

.0673 

0.590 

.0274235 

.0225806 

.0500041 

0.591 

.0271778 

.0228190 

.0499968 

0.592 

.0269359 

.0230578 

.0499937 

0.593 

.0266934 

.0232976 

.0499910 

0.594 

.0264615 

.0235337 

.0499862 

0.595 

.0262096 

.0237798 

.0499894 

0.596 

.0269677 

.0240222 

.0499899 

0.597 

.0257257 

.0242651 

.0499908 

0.598 

.0254838 

.0245107 

.0499945 

0.599 

.0262419 

.0247540 

.0499959 

0.6 

.025 

.025 

.06 


6. Summary. Various hypotheses relating to a population of two normal 
correlated variates have been considered and the appropriate test criteria for 
each hypothesis have been derived by the likelihood ratio method. The dis¬ 
tributions of the likelihood ratio criteria or of monotonic functions of them have 
been obtained with the aid of transformation (14). References have been given 
to tables from which significance levels for use in conjunction with the tests 
may be obtained; a new table of significance levels for the tests of Hi and Ht 
was given. 

The power functions of ru, Ri and ^2 have been compared; from these power 
functions it was concluded that Ri and Rt are suitable respectively for testing 
the hypothesis when ai = 0-2 and when, in addition, £i = £ 2 . 

In conclusion, I should like, to express my indebtedness to Professor E. S. 
Pearson for continued advice and help in the preparation of this paper, to Dr. 
A. Wald and Professor S. S. Wilks for valuable suggestions. 






Conditions defining Q and to together with the likelihood criteria appropriate for testing the hypotheses Hi 
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ON A LEAST SQUARES ADJUSTMENT OF A SAMPLED FREQUENCY 
table WHEN THE EXPECTED MARGINAL TOTALS ARE KNOWN 

By W. Edwards Deming and Frederick F. Stephan 


1. Introduction. Therp are situations in sampling wherein the data fur¬ 
nished by the sample must be adjusted for consistency with data obtamed from 
other sources or with deductions from established theory. For example in the 
1940 census of population a problem of adjustment arises from the fact that 
although there will be a complete count of certain characters for the individuals 
in the population, considerations of efficiency will limit to a sample many of 
the cross-tabulations (joint distributions) of these characters. The tabulations 
of the sample will be used to estimate the results that would have been obtained 
from cross-tabulations of the entire population.^ The situation is shown in 
Fig. 1 in parallel tables for the universe and for the sample For the universe 
the marginal totals N{. and N,i are known, but not the cell frequencies N„ ; 
for the sample, however, tabulation gives both the cell frequencies n,, and the 
marginal totals n,. and n,,-. 

In estimating any cell frequency of the universe, such as , three possi¬ 
bilities present themselves; from the sample one may make an estimate from 
the fth row alone, another from the jth column alone, and still another from the 
over-all ratio n,,/n: specifically, the three estimates would be , 

ntiN.Jn j, and n, ,N /n. As a result of sampling errors these will not be identical 
except by accident, and though any of them by itself may be considered ac¬ 
curate enough, still, if the whole r X s table of universe cell frequencies were so 
estimated, the marginal totals would not come out right. In this paper we 
present a rapid method of adjustment, which in effect combines all three of the 
estimates just mentioned, and at the same time enforces agreement with the 
marginal totals. The method is extended to varying degrees of cross-tabulation 
in three dimensions. 

In any problem of adjustment where the conditions are intricate it is neces¬ 
sary to have a method that is straight-forward and self-checking; this becomes 
imperative when we realize that in the three-dimensional Case VII of the 
problem now at hand (vide infra), any adjustment in one cell must be balanced 
by adjustments in at least seven others. The method of least squares is one 
possible procedure for effecting an adjustment and at the same time enforcing 
certain conditions among the marginal totals. It is essentially a scheme for 

1 Examples will occur in the 1940 census publications. Further discussion of this prob¬ 
lem and of the sampling procedure is given by the authors in,"The sampling procedure 
of the 1940 population census,” Jour, Am. Slat Assn., Vol. 35 (1940), pp 6 5 6 
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arriving at a set of calculated or adjusted observations that will satisfy the 
conditions of the problem, and at the same time minimize the sum of 
the weighted squares of the residuals, symbolized as 

(1) <S = S w(nc — naY 

Tie and no being the calculated and observed numbers in a cell, and n« — no the 
corresponding residual. It is the nature of the conditions impdsed on the ad¬ 
justed values that distinguishes one type of problem from another. Least 
squares has the practical advantage of uniqueness, once the weights of the ob¬ 
servations have been assigned, and it possesses the theoretical dignity of giving 
one kind of “best” estimates under ideal conditions of sampling. For our 
present purpose we shall minimize sums of the form 

(2) S == S(ot, — n,)Vn, 

fit being the observed frequency in the fth cell, and m, the calculated or adjusted 
frequency therein. The conditions among the m,' will arise from the fact that 
the marginal totals, after adjustment, must agree with their expected values, 
namely, the deflated marginal totals of the universe (for example, mt, and m., as 
defined in eqs. (6) and (7)). 

By definition, weight and variance are inversely proportional, hence the 
principle of least squares is identical with the minimizing of chi-square. Here 
the variance in the fth cell i6 ^.(l — vt/n), where v, is the expected number in 
.that cell, and n is the total number in the sample. Now if ri is sufficiently 
well approximated by n, , it follows that if no cell contains an appreciable 
fraction of the whole sample (a circumstance requinng a fair sized number of 
cells—perhaps 100), the variance may be taken as r, for every i, and the mini¬ 
mized S can be used as chi-square, But regardless of the number of cells, if 
the n,' be not too much different from one another, so that the factor 1 — Vi/n 
may be treated as a constant, we still get the least squares solution by minimiz¬ 
ing S as defined in eq. (2). 

2. The two dimensional problem. Suppose that the data on two character¬ 
istics (e.g. age and highest grade "of school completed) are obtained for each 
member of a universe of N individuals, and that tabulations of the data provide 
either (a) one set of marginal totals Ni., N 2 ., ■ ■ , Nr. ; or (b) in addition, the 
marginal totals N,i , N.i , ■ ■ ■ , N, The nature of the tabulations is presumed 
such that it is not feasible to count the numbers N,, in the cells, as would be 
done if one character were crossed with the other. Suppose, however, that for 
a sample of n individuals selected in a random manner from the universe, the 
two characters are crossed with each other, so that we know not only all the 
« -f r marginal totals of the sample but also the numbers 

{i — 1,2, ■ ■ , r) j = \, 2, ■ , s). The problem is to estimate the unknown 

frequencies Na in the cells of the universe. This will be done by finding the 
calculated or adjusted sample frequencies m,, and then inflating them by the 
inverse sampling ratio N/n. 
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For the least squares solution we seek those values of m„ that minimize* 

( 3 ) ^ - n,jf/n,,j 

wherein the m,, are subjected to one of the followmg sets of conditions: 

Case I : One set of marginal totals known. Assume Ni,, Nt ,■■■, Nr to be 
known. Then we require 

(4) Sw., =m,., i = 1, 2, r. 

These r equations constitute r conditions on the adjusted . 


UNIVERSE 

> 


SAMPLE 

J‘ 


/'/ 

N„ 

—L 

W/2 

■ ■ ' • 

Nfs 

N,. 

n,, 

.1 

H/z 

L" ”1 

J. 

n,s 

l^Z 


Niz 


Nzs 

Nz 

nz, 

Hzz 

1 

nzs 






• 






Nit 


■ :Ni. 


n,. 

r)i. 




M 




Nrz 


N, 

N.t - 


N, 





Nfs 

. Ns 


\Nr 

N 


_J 



Hr,' 

Dn 


n., 

n.z 


N;i unknown 

Marginal totals N,i and N<, known 
N known 


r>j 

n,f known 







l^r 

n.s n 


Marginal totals n j and n. known 
n known 


j = 1, 2, • • •, fi I 


Fio. 1. Showing the System of Notation fob the Cell Fbequencibb and Marginal 
Totals of the Universe and the Sample in the Two Dimensional Problem 

Case II ; Both sets of marginal totals known. Here the adjusted cell frequencies 
must satisfy not only condition (4) but also 

(6) S w.# = 

there being now a total of r + s — 1 conditions. In both cases, 

(6) m, = Ni.n/N, 

(7) m., = N.fi/N. 

In other words, mi. and m., are the deflated marginal totals, 

divided by the actual sampling ratio N/n .. The mi. and m., are not independent, 

for 

»The sign V will denote summation over all possible cells, unless otherwise noted. 
X will denote summation over all values of i, and Biimlarly for an inferior j or 

d^t, as in n, , will signify the result of summing the mi over all values of f in the ;th 
column. 
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(8) N 1 + N. 2 + ■ ■ +N., = Ni +N 2 . +■■■ + Nr. = N. 

It is for this reason that if i runs through all r values in eq. (4), then j can run 
through only s — 1 in eq. (5). A similar equation also exists for the marginal 
totals of the sample, namely, 

(9) n,i + • -f" n.a = ni. + Wj, + • • - + tir. = n. 

Solution 0 / the two dimensional Case I. Assuming that the adjusted values 
of the have been found, let each take on a small variation 5m„ ; then the 
differentials of eqs. (3) and (4) show that 

(10) §5<S = 2((m., — = 0 (one equation), 

(11) £ Sm,, = 0, i = 1, 2, ..., r (r equations). 

3 

Multiply now eq. (lit) by the arbitrary Lagrange multiplier — X,. , and add eqs. 
(10) and (11) to obtain 

(12) 2((m.; — n.',)/n„ — X,,l5m,, = 0, (one equation). 

By the usual argument, one may now set each brace equal to zero, recognizing 
that the r Lagrange multipliers are then no longer arbitrary but must satisfy 
the relation 

(13) mu = n„(l + \i.). 

The adjusted frequencies m„ can be computed at once as soon as the X, are 
found. To evaluate them one may rewrite the conditions (4) using the right- 
hand member of (13) for m<,, obtaining 

(14) m,. = n,.(l + X,.). 

Another way to arrive at this same relation is to sum each member of eq. (13) 
in the ith row. However obtained X, is now known, since wi,. and n,. are 
known, and in fact eq. (13) now gives 

(15) m„ = «i,(»re, /n,.). 

The adjustment is thus a simple proportionate one by rows, the cells in any one 
row all being raised or lowered by the proportionate adjustment in the row total. 
Case I thus amounts to r independent one dimensional proportionate adjust¬ 
ments, one for each row, and any one or all may be carried out, as desired. 
This result can be obtained by a simpler approach but is presented in this way 
for consistency with later cases. 

The minimized sum of squares may be computed directly, or from the row 
totals by seeing that 

(16) S = ^ inii — n,.)V^i. • 

% 

The term (mi. — n, )*/?ij. for the fth row may be considered separately, and 
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used as % with s - 1 degrees of freedom, or all rows may be combined into 
the minimized S as given in eq. (16), and used as ^ with r{s - 1) degrees of 
freedom. 

Solution of the two dimensional Case II. In addition to eqs. (11) we now 
have also 

(17) ^ Sm., = 0 j = 1,2, , s - 1 

which comes by differentiating eqs. (5). By addition of eqs. (10), (11), and 

(17) , after multiplying eq. (lli) by -X. and eq. {17j) by -X ,, we obtain 

(18) S{(m„ - nii)/nij — Xt — X.,!«m<, = 0 
Equating each brace to zero, as before, we -find that 

(19) = n.-,(l + X.. + X ,) 

wherein X , is to be counted 0. The adjustment is now no longer proportionate 
by rows, but involves every cell. 

To evaluate the Lagrange multipliers in eq. (19) we may sum the two members 
downward and across in Fig. 1 and obtain the r + s — 1 normal equations 

ni. \t. + 2 naX., = m,. — n, , i = 1, 2, •.., r 

(20) V- ’ 

2 ^ naXi, + n,iX y = m., — n.j, j = 1,2, ..., s — 1. 

i 

These can be reduced for numerical computation The top row solved for 
X(. gives I 


(21) 

= (l/n,,){w,. 

— S flips., f] 1 

; 

whereupon by substitution into the bottom row of eqs. (20) we arrive at the 
8—1 normal equations 

X.1 

X,2 • • * 

X.t-i — 1 

n 

1 ni. 

'V' n,inti 

• m. 

i Ui in,. 

(22) 

TljjWo 

■n.i Zj ^ ••• 

t nt. 

{Vi. i Ui. 



i n,. 1 nt. 


0 . 

Because of symmetry in the coefficients, those below the diagonal are not shown, 
indeed) in a systematic computation, they are not used. The 0 in t e ot om 
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row IS appended for the computation of the minimized S, if desired. The 
number of Lagrange multipliers to be solved for directly is s - 1, and the 
remaining ones come by substitution into eq. (21), X., being counted 0, 

A simple procedure for calculating the coefficients in the normal equations 
(22) is to set up a preparatory table by dividing each in the fth row by ; 
also to write down for l^hat row, for use on the right-hand side of the 

normal equations (compare Tables I and II). In machine calculation the con¬ 
stant divisor \/n^ would be left on the keyboard until the entire fth row is 
divided; or, if reciprocal multiplication is preferred, the multiplier would 

be left on. From this preparatory table, the cumulation of squares and cross- 
products in the vertical gives the required summations for the coefficients. The 
sum check would be applied in the usual manner. 

3. A numerical example of the two dimensional Case II. The fact is that 
in practice one need not bother about forming and solving the normal equations 
because they will be displaced by a simplifying iterative procedure, to be ex¬ 
plained in a later section. For illustration, however, we may do an example 
both ways, first using the normal equations and the adjustment (19), later on 
accomplishing the same results by the quicker method. 

We may start with the unitalicized numbers in the 4X6 array of Table I, 
assuming these'to be the sampling frequencies n,, to be adjusted, Actually, 
they were obtained by deflating 1 /20th (for a supposed 5 per cent sample) the 
New England age X state table on p. 1108 of vol. 2 of the Fifteenth Census of 
the U. S., 1930, then varying the deflated values by chance with Tippett’s 
numbers to get our sampling frequencies n,,. The italicized entries in Table I 
represent the final (adjusted) m.j, and it is these that we now set out to get. 
We start off with the sampling frequencies n,,- and the known marginal totals 
mi, mt, etc, where m, = N,n/N, to, = N,{n/N, as m eqs. (6) and (7). 
The Lagrange multipliers shown along the left-hand and top borders arise in the 
calculations now to be undertaken. 

Table II is the preparatory table, advised at the close of the last section. It 
is derived from Table I by dividing the fth row of sample frequencies by . 
F or exa mple, the entry 8.64 in the cell i = 3, j ,= 2 comes by dividing 419 by 
\/2352, 419 being the entry in the cell of the same indices in Table I, and 2352 
being the sum of the third row. The sums at the bottom and right-hand side 
are for checking the formation of the normal equations. The cumulations of 
squares and cross-products along the vortical give the summations required for 
the normal eqs. (22), which now appear numerically as eqs. (23). 

No, X,i X ,2 X.3 = 1 

1 7413 - 3549 - 2354 = 3197 X 10“‘ 

2 4441 -544 = 2356 

3 3129 = —3222 

4 . 0 


( 23 ) 
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Performing the solution by any favorite procedure one will obtain 
(24) X1 = .01182 X i = .01490 X.j = .00119 

TABLE I 

A table of artificial sample frequencies, an artificial 6 percent sample of native 
white persons of native white parentage attending school, by age by state, New 
England, 1930. The adjusted frequency mj, in each cell is shown italicized 
just below the corresponding sample frequency n„ 


Age 

7 to 13 

14416 

16&17 

18 to 20 




j = 

1 

2 

3 

4 

Ui 



X.i = 

.0118 

.0149 

0012 

0 

m 

State 

i 

X.. 






Maine 

1 

-.0146 

3623 

781 

557 

313 

5274 




3613 

781 

650 

308 

5252 

New Hampshire 

2 

-.0003 

1570 

395 

251 

155 

2371 




1688 

401 

251 

166 

2395 

Vermont 

3 

.0234 

1553 

419 

264 

116 

2352 




1608 

435 

2f0 

119 

2432 

Massachusetts 

4 

-.0162 

10538 

2455 



15869 




10492 

2452 

1680 

1141 

16766 

Rhode Island 

5 

- .0230 

1681 

353 

171 

154 

2359 




1662 

350 

167 

160 

2330 

Connecticut 

6 

-.0034 

3882 

857 

544 

339 

5622 




3916 

867 

543 

338 

— 

6662 



n , 

22847 

5260 

3493 

2237 

33837 



m., 

22877 

6286 

3462 

2213 

33837 


The adjusted rru/ (italicized) are rounded off, hence when sununed may occasionally 
disagree a unit or so with the expected marginal totals (also italicized), the latter arise 
by deflation from the universe rather than by direct addition of the Wij. 


whereupon by substitution into eq. (21) comes 

Xi. = -.0146 U. = -.0162 
(25) Xs, = -.0003 X6. = -.0230 

X,, = +.0234 X6. = -.0034. 

The next step is to compute the m„ by eq. (19). Table I is now bordered 
with the Lagrange multipliers for a convenient arrangement of the factors 
required, and the calculation is completed. It will be noted that, for example 
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(26) mm = 419(1 + .0234 + .0149) = 435. 

The Ml, thus calculated are shown italicized in Table I. The marginal totals, 
found by adding the just calculated, do not agree exactly everywhere with 
the expected totals, because of rounding off to integers; the errors of closure, 
however, are slight, and it is a simple matter to raise or lower some of the larger 
cells by a unit or two to force exact satisfaction of the conditions, if this is 
desired. 

4. The three dimensional problem. Jlere the N cards of the universe are 
sorted and counted for one and perhaps a second and third characteristic, and 
possibly crossed by pairs in various combinations (Cases I-VII). The sample 
of n, however, is crossed by all three characteristics, which is to say that the 

TABLE II 


This comes by dividing each sample frequency in Table I by the corresponding -s/n ,. 
{This operation would ordinarily be done a row at a time) 



3 = 


Sum 


1 

2 

3 

4 

i = 1 

49.89 

10.76 

7.67 

4.31 

72.32 

144.94 

2 

32:24 

8.11 

5.15 



97.87 

3 

32.02 

8.64 

5.44 

2.39 

50.15 

98.64 

4 

83 68 

19.49 

13.55 

9.21 

125.19 

251.12 

5 

34.61 





96.54 

6 

51.77 

11.43 



75.51 

160.49 

Sum 

284.21 

65.69 

42.59 

26.78 


839.60 


cell frequencies are all known (refer to Fig. 2). As before, the adjusted 
frequencies are required. 

Case I: One set of slice totals hnovm. Assume the slice totals Ni. , Ni , 
• • • ,Nr. to be known; the conditions are then 

(27) 22 Wij* = 1^.. = Ni .n/N i = 1, 2, ■ • ■ r 

ik ■ 

being r in number. The summation to be minimized is 

(28) )S = 2(m.,fc — n.,ib)7«'.^ifc 

being similar to that in eq. (3), except that now there are three indices to be 
summed over instead of two. Following a procedure similar to that used before, 
we differentiate eqs. (27) and (28) and introduce the r Lagrange multipliers X,. 
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■flrith eq. (27). The steps are identical with, those of the two dimensional Case I, 
and the result is at once 

(29) == Uijkil + X, ) = /n.,). 

This adjustment, like that shown by eq. (15), is a simple proportionate one, but 
this time by slices rather than by columns. All cell frequencies having the same 
i index are raised or lowered in the same proportion. 



Fm. 2. Showing the System of Notation fob the Cell Feeqtjbncihs and Marginal 
Totals in the Three Dimensional Sample 

Case ir. Two sets of slice totals known. Here, in addition to the slice totals 
of Case I we know also 

N 1 , N.i ., • • ■ , N.,, 

whence arise the s — 1 additional conditions 

(30) = =N,.n/N, j = 1, 2, • • •, s - 1. 

ik 


436 


w. BDWAHDS DEMING AND PEEDERICK F STEPHAN 


Using the Lagrange multiplier X ,. here, and \i with eq. (27) as before, we 
find that 

(31) mijk = Wi,*(l + X, + X j.) 

in which X is to be counted zero. This adjustment is proportionate by tubes, 
the ratio being constant along the zjth tube and in fact equal to 

m,, /n,,., independent of k. Unfortunately we do not here know the face totals 
m„. and are unable to make use of the proportionality as we shall in Case IV. 

To solve for the r + s — 1 Lagrange multipliers we sum the members of eq. 

(31) over j and then over i and arrive at the normal equations 

rii. X., + 12 n„ X., = m... - n..., i = 1, 2, ..., r, 

(32) „ 

2^ n.v X... + n.jXi. = m— n j = 1, 2, •.., s — 1. 
\ 

These can be-reduced to s — 1 equations in precisely the same way that eqs. 
(20) were reduced, but because of the iterative process to come further on, we 
shall not pursue the reduction here. 

Case III: All three seta of slice totals known All slice totals 

N.u , V.*., ... , N... 

, Nr.. 

N..i,N.. 2, ■-■,N..t 

now being kno-wn, in addition to conditions (27) and (30) we require here 

(33) 12 mi,k = m..k = N. lu/N, fc = 1, 2, 1 

ti 

which makes a total of r + (s — 1) + (< — 1) or r + s + ( — 2 conditions- 
The same kind of manipulation as used heretofore gives 

(34) m,,* = ni,i(l + X<.. -1- X + X .*) 

with X.,. and X i to be counted zero. The adjustment is no longer propor¬ 
tionate by slices or tubes, but involves every cell. In practice, once the normal 
equations are solved and the Lagrange multipliers worked out, one proceeds 

very much as in the two dimensional Case II: for each of the t slices, corre¬ 

sponding to the t values of k, there will be a two dimensional adjustment, the 
1 in eq. (19) being replaced now by 1 -|- X. *. 

The normal equations for the Lagrange multipliers can be found by per- 


forming double summations on eq. (34). The result is 

n, X,, -h 12 n,/. X.,. + 12 k\.k = m,.. — n,.. , 

) k 

i = l,2, .. 

■, r, 

(35) 2 »iij.X.-.. + w /.X ,, -f- 12 n.ikh k = m.j. — n.j., 

* * 

i = 1, 2, .. 

., a - 1, 

12 iX,.. -|- 12 n.,iX,,, -f n..k\..k = ni..k - n..k, 

11 

., t - 1. 


* ) 
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If these calculations were to be carried out, one would simplify the computation 
by solving the top row for X. , getting 

(36) X.'. = d/n.'.) {m.,. - En.,.X,/. - Z - 1 

I k 

and then substituting this into the middle and last rows of eqs. (36) to get a 
reduced set of s t 2 normal equations for the Lagrange multipliers X , 
and X a , the numerical values of which when set back into eq (36) give the X, 

In all the summations of eqs (36) and (36), X, and X.,i would be counted zero. 
But here again, the iterative process to be explained later will displace the use 
of normal equations, so actually we are not interested in reducing them. 

Case IV: One set of face totals known. It may be that the rs face totals 

Hn. I -^12. > • • • , -hfij.) • • , 

are known from crossing the i and j characters m the universe. The conditions 
are then 


(37) Z wiv,* = vfiii. - N,,.n/N 

k 

The adjustment here turns out to be 



(38) m„k = n„fc(l + X,j.); 

but by summing both sides over the index k to evaluate X„. it is seen that 


(39) 

ITli], — “1" 

whence 


(40) 

muh - n,jk(nizj Inn ). 


This adjustment is thus proportionate by tubes, like that in eq. (31), though 
here the factor mn./ntj, is known and eq. (40) can be applied at once. 

Case V: One set of face totals, and one set of shoe totals known. Sometimes, in 
addition to the rs face totals of Case IV, the slice totals 


N..y,N . 2 , ■■■ ,N..t 

will also be known, in which circumstances the conditions (37) are to be accom¬ 
panied by 

(41) Z ~ N..k'r>'/N, fc = 1, 2, •••,< — 1. 

it 

The same procedure as previously applied yields now 

(42) m,]k = ni)k(l + X„. + X..t) 

with X. (to be counted zero. Summations performed over k, and then over i 
and j together, give the normal equations 
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“H ^ 'j ,h Wlij. j 

, * 

(43) „ 

2- 4- n .fcX. k = m.k- n..k. 

ij 

The number of equations is rs + i — 1, since \ t does not exist. As before, 
a simplification can be effected by solving the top row for X„ and making a 
substitution into the lower one, but because of the great advantage of the 
iterative process to be seen further on, we shall not carry out the reduction. 

Before going on it might be noted that although this case is three dimensional, 
it reduces to the two dimensional Case II if one considers that ij. is one index 
runnmg through the values 11, 12, • • , 21, 22, ■ ■ ■ , rs, and that . fc is a second 
index running through the values 1, 2, • • • , f. This can be seen by the simi¬ 
larity between eqs. (43) and (20). 

Case VI : Two sets of face totals known. If in addition to the face totals of 
Case IV, the face totals 

NM , V.ij, • • • , iV.il 

are also known from further crossing the j and k characters in the universe, we 
shall require 


(44) niiii, = m ,k = N.,kn/N, 

• fc = 1, 2, 1 

in addition to the conditions (37). In place of eq. (40) of Case IV we now 
find that 

(45) = n.,*(l + X,-,. -f X.,t) 


in which X,,i is to be counted zero for all j. No simple relation such as eq. (40) 
is possible here, because the adjustment is not proportionate by tubes; the 
Lagrange multipliers must be evaluated. This can be accomplished by summing 
the members of eq. (45) over fc and i in turn, resulting in the normal equations 


(46) 


Wij Xi, ”)■ X ji — 771,,. 71,,'. , 

7i„/fX,, -|- 7i,,*X jk ~ ni,jk n.jic, 

i 


Since X ,i does not exist for any values of j, the number of equations is 
rs + s(t — 1) = s(r 4- < — 1). They break up at once into s sets each of 
T t — \ equations, one set for every j value. In fact, the problem can be 
considered as s sets of the two dimensional Case II. Any one value of j gives 
a slice, which can be looked upon as fulfilling the specifications of the two 
dimensional Case II. Each set of normal equations can be reduced in the same 
manner that eqs. (20) were reduced. 

Case VII: All three sets of face totals known. All totals now being known, 
we require 
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(37) 

2 ■m„k = m,v. = N„.n/N, 

i = 1, 2, • 

J = 1,2, ••• 

(44) 

52 milk = TO.,* = N.,kn/N, 

i = i,2,--- 


4 ^ 


(47) 

£ TO;,-* = TO,.* = JV. *n/JV, 

i = l,2,.-- 


fc = l,2, 


The adjusting relation is 


(48) ‘>ntjk — n{jk(l + X,-,- + X.jt -j- X,,it) 

m which X.,-< is to be counted zero for any j, Xr t for any k, and \t.t for aoy i. 
The normal equations for the Lagrange multipliers are 

ntjic'h. + 52 nvtXi.it = m,,. — n„. 

k k 

(49) ^ Uiik'S,,. 4- ra.,fcX.,jfe + ^ n,,i,\{k = TO,* — n.,k 


^2 4* 52 “h ^i.fcXi.fc — TO|.* 


being rs 4- rf + af — r — a — f 4 1 in number. They can be reduced in the 
same way that previous normal equations have been reduced; but here again, 
the iterative process will render the use of normal equations unnecessary, except 
for theoretical purposes, e.g. justification of the iterative process. 


6. A simplified procedure—^iterative proportions. It is well known m leas 
squares that the number of Lagrange multipliers in any problem is equa to t e 
number of conditions imposed on the adjustment. Here the conditions ave 
appeared in sets, depending on which marginal totals are involved, ^ 
parison of eqs. (16) and (29) on the one hand, with eqs. (19), (31), ( )’ ( 

(45), and (48) on the other, we see that wherever there was only one ® 
marginal totals involved we came out with a proportionate adjustmen , u 
that in nil other cases it was not so; the Lagrange multiplied 
unfortunately related to one another through normal equations. We now e 
the observation, however, that as a first approximation the adjus men s 
all be considered proportionate, and we shall be able to write down ^ ® ^ 

for the error in this approximation, and shall be able to eliminate i 
cession of proportionate adjustments. 

Take the two dimensional Case II for an example. In ^ There 

recognize (l/nv.) 52 as a weighted average of X , for the t ^ 

will be a weighted average of X,, for the first row, another for 
one for each value of i; consequently one may appropriately spe 
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average of X,,, wnting it f-av. X Substituting from eq. (21) into (19) one 
then sees the adjustment (19) appear as 

(50) m,, = n„(wi. /n, + X , — f-av. X ,). 

If, on the other hand, X , had been eliminated from eqs. (20), instead of X,., 
the result would have been 

(51) Ttii, = i + \, — i-av. X».). 

From either eq. (50) or (51) it is clear why the adjustment (19) is not propor¬ 
tionate by rows or columns, and why Case II does not break up into r or s sets 
of Case I: the reason is that X , in any cell is not necessarily equal to the average 
X., for that row, nor is X,. in any cell necessarily equal to the average X, for 
that column. If nevertheless one were to make the simple proportionate 
adjustment 

(62) rti'a = ni,(m, /nt ) 

along the horizontal in the ith row, the horizontal conditions (4) will be en¬ 
forced but not the vertical ones (5); i.e., it will be found that m[ = mi , but 
that usually not all ot.', = m.,. This is because eq. (52) effects only a partial 
adjustment, each w<, being in error through the disparity between theX , proper 
to the ith column, and the average of all the X,, for the ith row, as seen in 
eq. (50). This error can then be diminished by turning the process around and 
subjecting these to a proportionate adjustment in the vertical according to 
the equation 

(53) m'i, — mi,{m 

which may be considered an application of eq. (51) wherein the disparity be¬ 
tween any X,- and the average X,, for the jth colunm has been neglected. It is 
the vertical conditions that will now be found satisfied, but perhaps not all of 
the horizontal ones, because some of the row totals may have been disturbed. 
The cycle initiated by eq. (52) is therefore repeated, and the process is con¬ 
tinued until the table reproduces itself and becomes rigid with the satisfaction 
of all the conditions, both horizontal and vertical. The final results''coincide 
with the least squares solution, which is thus accomplished without' the use of 
normal equations. 

Usually two cycles suflice. In practice the work proceeds rapidly, requiring 
only about one-seventh as much time as setting up the normal equations and 
solving them. The tables III-V show the various stages of the work when 
the method of iterative proportions is applied to the sample frequencies of 
Table I. It wiU be noticed that the results of the third approximation (Table V) 
are final, since if the process were continued, the table would only reproduce 
itself. 

The same process can be extended to three or more dimensions writh an even 
greater relative saving in time. To see how the method of iterative proportions 
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applies in one of the three dimensional cases, we may go back to Case III. By 
the substitution afforded through eq. (36) the adjusting eq. (34) may be put 
into the form 


TABLE III 


The method of iterative 'pro-portions applied to the data of Table I. First stage: 
A proportionate adjustment by rows by eq {52). Note that m[ = mi , 

but that m[j m,, 



i = 1 

2 

3 



rrn 

t = 1 

3608 

778 

555 

S3 

5253 

5252 

2 

1586 

399 

254 

■■ 

2396 

2395 

3 

1606 

433 

273 


2432 

2432 

4 

10476 

2441 

1696 

1153 

15766 

15766 

5 

1660 

349 

169 

152 

2330 

2330 

6 

3910 

863 

548 

341 

5662 

5662 

m., 

22846 

5263 

3495 

2235 

33839 


m.i 

22877 

5285 

3462 

2213 


33837 


TABLE IV 

A continuation of the process imhaied in Table III. The figures in Table III 
are now adjusted proportionately by columns according to eq. {53), The vertical 
totals w!,l and m , now are equal, but the agreement of the horizontal totals 
accomplished in Table III has been slightly disturbed _ 



J-1 

2 

3 

4 

m'i. 

mi. 

1 = 1 

3613 

781 

550 

309 

5253 

5252 

2 

1588 

401 

252 

155 

2396 

2395 

3 

1608 

436 

270 

119 

2432 

2432 

4 

10490 

2451 

1680 

1142 

15763 

15766 

5 

1662 

350 

167 

151 

2330 

2330 

6 

3915 

867 

643 

338 

5663 

5662 

If 

mj 


5286 

3462 

2214 

33837 


m.f 

22877 

5286 

3462 

2213^ 


33837 


‘ (54) = ni,i(mj. /n., + b.j + h..k - f-av. \ f-av. X. *). 


Equally well it could have been written 

(56) = nUm., /n.,. + Xo. + X * - J-av. X... - j-av. X .*), 


or 
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(56) muk = n^,k{m..k/n .* + Xj + X.,. - fc-av. X... - fc-av. X ,.)• 

Any of these three equations shows why the adjustment (34) is not propor¬ 
tional by slices, and why this case does not break up into r or s or i sets of the 
three dimensional Case I As a first approximation it does, as is now clear 
from these three equations, and by making successive proportionate adjust¬ 
ments we may thus arrive at the least squares values. To go about the work 
we could first calculate the values of 


(57) 

m'ijk =' rnik{m^..ln ,.) 

then 


(58) 

rrilik = m.',fc(m , /m' 


TABLE V 

The cycle is commenced again. The figures of Table IV are subjected to a propor¬ 
tionate adjustment by rows, according to eg. {52). And since these results turn 
out to be almost a reproduction of Table IV but with both horizontal and vertical 
conditions satisfied, they are considered final. The agreement with the m„ in 
Table I should be noted 



■Ol 


3 

4 

mi. 

TTli. 

i = 1 

3612 

781 

550 

309 

6252 

5262 

2 

1687 

401 

252 

155 

2396 

2396 

3 

1608 

435 

270 

119 

2432 

2432 

4 

10492 

2451 

1680 

1142 

16765 

15766 

' 5 

1662 

350 

167 

151 

2330 

2330 

6 

3914 

867 

543 

338 

5662 

5662 


22875 

6285 



33836 


m,, 

22877 

5285 


mm 


33837 


followed by 

(59) m!jk = ml,k{m.i,lm'.'jf). 

These three successive adjustments would constitute a cycle, which would then 
be repeated in whole or in part until the table becomes rigid with the satis¬ 
faction of all three sets of^onditions, 

6. Simplification when only one cell requires adjustment. On occasions it 
happens in sampling work that one is especially interested in one particular cell 
of the universe, and would like to have a result for it in advance before the other 
cells are adjusted. Sometimes it even happens that the others individually 
are of no particular concern. In such circumstances one merely places the cell 
























A l/EABT SQtlAIlEB ADJUSTMENT 


of interest in one comer of the table by an appropriate interchange of rows and 
columns, and then comprchses the rest of the table into the cells adjacent to it 
In the two dinaensional Case 11 one would thus work with a 2 X 2 table, one 
corner cell being the one of special interest, the other three beuig the result of 
compi’cssion The marginal totals of the row and column belonging to the cell 
of interest are unaffected. For illustration we may suppose that from the 
sample shown in Tabic I we require only mei. We then start with the 2 X 2 
Table VI, which is derived from Table I by compression. Commencing with 
Table VI, one might first adjust by rows according to eq. (52), then by columns 
by eq. (53), One cycle of iterative proportions is sufiBcient, as is seen m Table 

TABLE VI 


Derived from Table 1 by compression, the cell i = 6, j = 1, requinng adjustment 



3 •» 1 

f = 2 ~ 4 

_ 

m,. 

1=1-6 

18965 

9250 



i = 6 

3882 

1740 

5622 

5662 

n.) 

22847 

10990 

33837 


TTl.j 

22877 

10960 


33837 


TABLE VII 

A proportionate adjustment of Table VI 
Bows adjusted by eq. (52) Columns adjusted by eq. (53) 


18938 

9237 

28176 

18962 

9213 

28175 , 

3910 

1752 

5662 

3915 

1747 

5662 

22848 

10989 

33837 

22877 

10960 

33837 


Conclusion; mm = 3915 


VII, and the value 3916 found for is in good agreement with its value shown 
in Tables I and V, The scheme of compression provides a quick, method of 
getting out an advance adjustment for a cell of special interest, and the result 
so obtained will ordinarily be in good agreement with what comes later when 
and if all the colls are adjusted. 

In the three dimensional Cases II, III, V, VI, and VII, one compresses the 
original table to a 2 X 2 X 2 table, and then uses the method of iterative propor¬ 
tions. (The other cases do not require consideration, since they are propor¬ 
tionate adjustments wherein one is already at liberty to adjust as few or as 
many cells as he likes without altering the equations or the routine.) The same 
procedure can be extended to the adjustment of two cells, the only modification 
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being that in two dimensions we shall compress toa2X3ora3X3 table 
depending on whether the two cells do or do not lie in the same row or column. 
In three dimensions we compress toa2X2X3, ora2X3X3, ora3X3X3 
table; the first if the two cells lie in the same i, j, or k tube, the second if they 
lie in the same slice but not in the same tube, the third if they are in separate 
slices. 

7. Some remarks on the accuracy of an adjustment. A least squares adjust¬ 
ment of sampling results must be regarded as a systematic procedure for 
obtaining satisfaction of the conditions imposed, and at the same time effecting 
an improvement of the data in the sense of obtaining results of smaller variance 
than the sample itself, under ideal conditions of sampling from a stable universe. 
It must not be supposed that any or all of the adjusted m,, in any table are 
necessarily "closer to the truth’' than the corresponding sampling frequencies 
n,j , even under ideal conditions. As for the standard errors of the adjusted 
results, they can easily be estimated for the ideal case by making use of the 
calculated chi-square, For predictive purposes, however (which can be regarded 
as the only possible use of a census by any method, sample or complete), it is 
far preferable, in fact necessary, to get some idea of the errors of sampling by 
actual trial, such as by a comparison of the sampling results with the universe, 
as can often be arranged by means of controls. There is another aspect to the 
problem of error—even a 100 per cent count, even though strictly accurate, is 
not by itself useful for prediction, except so far as we can assert on other grounds 
what secular changes are taking place. 

In conclusion it is a pleasure to record our appreciation of the assistance of 
Miss Irma D. Friedman and Mr. Wilson H. Grabill for putting the formulas 
and procedure into actual operation with census data, and thereby disclosing 
defects in earlier drafts of the manuscript. 

Btobau op the Census, 

Washington 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


THE STANDARD ERRORS OF THE GEOMETRIC AND HARMONIC 
MEANS AND THEIR APPLICATION TO INDEX NUMBERS^ 

Bt Nilan Norbis 

Attempts to derive useful expressions for estimating the standard deviations 
of the sampling errors of the geometric and harmonic means have not yielded 
results comparable with those afforded by the modem theory of estimation, 
including fiducial inference. There are in the literature of probability theory 
certain theorems which can be applied to obtain these desired results in a 
straightforward manner. The use of forms for estimating standard errors is 
subject to certain conditions which are not always fulfilled, particularly in the 
case of time series. An understanding of these limitations should deter those 
who may be tempted to judge the significance of phenomena such as price 
changes solely on the basis of estimated standard errors of indexes. 

1. Statement of formulas. The standard error of the geometric mean of a 
sequence of positive independent chance variables denoted by n, =* iSi, aj, • • • , 

Xn, is Vo — 0i , where 9i is the population geometric mean of the variates; 
V n 

so that ffios X is the standard deviation of the logarithms in the population as 
given by vio* * = [J?{[log x — E(log x)j*)]^; and n is the number of individuals 
comprising the sample. The estimate of the standard error of the geometric 

mean is Sa = G where G is the sample geometric mean, that is, the 

Vn^ 

estimate of 6i; so that sug ,, is the estimate of aiog * ; and n - 1 is the degree of 
freedom of the sample. 

^ This article Bummariiee two papers presented at sesBions o£ the Institute of Mathe¬ 
matical Statistics at Detroit, Michigan on December 27, 1938, and at Philadelphia, Penn¬ 
sylvania on December 27, 1939. The results given herein can be derived by several meth¬ 
ods, which vary somewhat as to degree of rigor. The writer wishes to acknowledge his 
indebtedness to the referee for suggesting a proof based on a probability theorem stated 
by J L. Doob, "The limiting distributions of certain statistics," Annals oj Math. Stat., 
Vol 4 (1935),pp 160-169. The Standard deviation formulas obtained follow as an applica¬ 
tion of this theorem, as will be seen by reference to it. Obviously the asymptotic variance 
formulas of many other statistics (estimates of parameters) can be obtained in a similar 
manner. 
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The standard error of the harmonic mean of a sequence of positive inde¬ 
pendent chance variables denoted by a:, = * 1 , X 2 , ■ • • , a:„ , is = el ~ , 

■\/n 

where the population harmonic mean of the variates is 02 = 1/a = 

so that the standard deviation of 1/x in the population is o-i/j, = [E{[l/x — 

il(l/a:)]“)]*; and n is the number of observations comprising the sample. The 

estimate of the standard error of the harmonic mean is = ~ —, where 

a \/n-l 

the estimate of a is given by o = ^ = i (S 1/a:,); in which s 1 /,, is the standard 

H n 

deviation of the reciprocals of the observations comprising the sample; and 
n — 1 is the degree of freedom of the sample. 


2. Derivation of formulas. These forms can be obtained by application of 
the Laplace-Liapounoff theorem^ as follows' Let a;, = a:i , a:2 , • • ■ , ain be a set of 
positive independent chance variables with the same distribution functions, 
where the expectations, E(Xt) and E(x\) exist, and where xl = .®([a:,' — i!(a;,)]°) 
> 0. The last condition is imposed to eliminate the trivial case in which the a, 
are all equal and their distribution is confined to a single point. The geometric 
mean of the is (? = (a:i-i 2 - • • • and the harmonic mean of the », is 


H 


= '-'L-V- 

_n 


It is necessary to 'assume that both vug x and vi/, are finite, and that in the 
case of both log'x and l/x at least one moment of order higher than any two of the 
respective variates is also finite. The requirement that the variance and at 
least one moment higher than the variance be finite can be weakened in various 
ways, but this is a trivial consideration, since nearly all distributions of any 
importance have finite third moments.’ Certain rarely occurring types of 
distributions, such as the Cauchy distribution, have infinite variance. In such 
cases, standard error formulas as ordinarily used are not valid. 

Let i?(log x) = f, and E{l/x) = a. By the Laplace-Liapounoff theorem. 


except for terms of order l/y/n, the limiting distributions of ^ ^ 


and 


- ct) 


<riog 9 

are normal with zero arithmetic means and unit variances. 


iri/a 

That is, if C represents a set of conditions on chance variables, and PjC) is the 
probability that these conditions are satisfied, then 


• A. Khintchine, Asymptotisclie Gesetze der Wahrsoheinlichkeitsreohnung, Ergehnisse 
der Mathemaiik und ihrer Gremgehiete, J Springer, Berlin, 1933, Vol, II, No 4, pp. 1-8; 
J. L. Doob, op. cit., pp. 160-169, and S S. Wilks, Siatishcal Inference, 1936-1937, Edwards 
Brothers, Inc , Ann Arbor, 1937, pp. 39 /. 

• For a more detailed discussion of this matter see Wilks, op. cit., pp 39 f. 
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ito <; 

n*^co ^log» 


lim P 

n-*oo 




1 



_*> 

e “ dx. 


In order to use these relations in obtaining the limiting distributions of the 
geometric and harmonic means, it is necessary to suppose that the sequence of 
random chance vanables, F,, converges in probability (converges stochasti¬ 
cally) to p, and that the sequence of random chance variables, \/^(F, — p), has 
a normal limitmg distribution with zero arithmetic mean and variance tr^ 
Also, it is necessary to assume that the real-valued function, f{x), has a Taylor 
expansion valid in the neighborhood of p. If f (p) 5 ^ 0 , only the first two terms 
of the series are needed. The required expansion is given by 


/(®) = /(P) + (a: - p)r(p) -f + Kx - p)], 


whereO < < 1 , and/"(a:) is continuous in the neighborhood of p. When these 

conditions are fulfilled, the limitmg distnbution of Vn[/(Fi) - /(p)] is normal 
with an arithmetic mean of zero and a variance of iT“[/'(p)f. 

Let/(log (?) = ", and use the expansion given by " = e*' -f (log G — f)e‘' 

■h ^(los (r f) ^ Since = e^, it follows that the limiting distribu¬ 

tion of ■\/n(G — 61) is normal with an arithmetic mean of zero and a variance of 

log * ■ 

Similarly, it can be shown that the limiting distribution of \/n(H - di) is 
normal with an arithmetic mean of zero and a variance of Bzal/i , where flj = 

i = [B(iA)r. 

It is of some interest to observe that the expressions for the standard errors 
of the geometric and harmonic means correspond with the forms previously 
given for the standard errors of two efficient ratio-measures of relative variation,* 
namely, 

_ 

(faiA. — -^oaIq, aria Hb/o = -^(ro/a, 

V $1 

where 81/ 6 is the population geometric-arithmetic ratio, and 82/61 is the popula¬ 
tion harmonic-geometric ratio. 


3. Limitations of standard-error estimates. Application of these forms is 
subject to the usual conditions for drawing sound inferences on the basis of ^he 
representative method. Fiducial argument should be employed to avoid certain 
untenable assumptions of the outmoded method of using standard errors. 
Estimates of the standard deviations of sampling errors do not constitute an 
ultimate test of significance which can be applied with a high degree of success 
to all types of problems. In general, such estimates cannot be relied upon with a 


* Nilan Norris, "Some efficient measures of relative dispersion," Annals of Math. Stul,, 
Vol. 9 (1038), pp. 214-220. 
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high degree of confidence when they are used as tests of significance for index 
numbers, since in nearly all time series there exists an appreciable degree of 
serial correlation, persistence, or lack of independence among successive items of 
any sample. 

4. Bibliographical note. Certain aspects of the sampling distribution of the 
geometric mean have been discussed by Burton H. Camp.® Attempts to derive 
forms for estimating the standard errors of index numbers have been made by 
Truman L. Kelley® and Irving Fisher,^ and an empirical study of the sampling 
fluctuations of indexes has been made by E. C. Khodes.® Although various 
special tests of significance for time aeries have been proposed,® at the present 
time no generally satisfactory procedure has appeared. 

Httntxb Colleox, 

New Yohk, N. Y. 


‘ Burton H. Camp, “Notes on the distribution of the geometric mean,” Annals of Math. 
Slat., Vol. 9 (1938), pp. 221-226. 

“ Truman L. Kelley, “Certain Properties of Index Numbers,” Quarterly Publicahons of 
Am. Slat. Assn., Vol. 17, New Series IM, Sept, 1921, pp. 826-841. 

’ Irving Fisher, The Making of Index Numbers, Houghton Mifflin Company, New York, 
1927, 3d ed , pp. 225-229, 342-345, and Appendix I, pp. 407 and 430 f. 

‘EC. Rhodes, “The precision of index numbers,” Roy. Slat. Soc. Jour., Vol. 99 (1936), 
Part I, pp. 142-146, and Part II, pp. 367-369. 

• Some of the more recent papers dealing with this matter are: G Tintner, “On tests of 
significance in time series,” Annals of Math. 8tat., Vol. 10 (1939), pp. 139-143; “The analysis 
of economic time aeries," Am. Stat. Assn Jour., Vol. 35 (1940),,pp. 93-100; L. R. Hafstad, 
“On the Bartels technique for time-series analysis, and its relation to the analysis of 
variance,” Am. Stat Assn. Jour., Vol. 35 (1940), pp. 347-361, and Lila F. Knudsen, “Inter¬ 
dependence in a series,” Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 507-514. 


A NOTE ON THE USE OF A PEARSON TYPE HI FUNCTION IN 

RENEWAL THEORY 

By a. W. Brown 

One of the methods suggested by A. J. Lotka^ for the derivation of the renewal 
function may be briefly summarized as follows. 

The method consists of dissecting the total renewal function into "genera¬ 
tions”. The original installation constitutes the zero generation, the units 
introduced to replace disused units of the zero generation constitute the first 
generation, renewal of these the second, and so on. Let/(a;) be the “mortality” 
function, the same for all generations. f{x) is a function satisfying the usual 
conditions of a distribution function. Adopting Lotka’s notation, let N be the 
number of units in the original collection, Ri(i) dt the number of objects intro- 

1 A. J Lotka, “A Contribution to the Theory of Self Renewing Aggregates, With Special 
Reference to Industrial Replacement,” Annals of Math. Stat, Vol. 10 (1939), p. 1. 
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duoed between times I and I + * and belonging to the amt gaiemlion, B,(,) * 
a sumlar expression for the second generation, etc mav 

be regarded as renewal density functions for the various gkerations' ” 
Now, evidently, 


( 1 ) 

( 2 ) 

and in general 
( 3 ) 


5i(0 = Nm 

Bt{t) = f Bi{t — z)f{x) dx 

Jo 


= f Bf(l — x)f(x) dx. 

Ja 


Summation of the contributions of the successive generations gives for the total 
renewal at the time i 


( 4 ) 


B(t) — Bi(t) -f- J B(i — x)f(x)dx. 


In this note we propose to use a Pearson Type III function for/(») and observe 
what form. our equations then assume. The Pearson Type III function 

{c > 0, k > 0), appears to be a reasonable one to use in many 

practical situations. The two parameters c and k give it a considerable amount 
of flexibility. The fact that this function has an unlimited range in one direc¬ 
tion is relatively unimportant from a practical point of view, as is well known 
from the experience of fitting curves of this type to skewed data with limited 
range. Of course the question of whether a Type III curve is appropriate can 
be answered more objectively by using the usual Pearson curve-fitting criteria, 
01 , 0 i and k. We have, then, substituting in (1) 






( 6 ) 

and from (2) 

( 6 ) 

If, now, we set z = ty, the integral in (7) reduces to 


'da 




f'(t-x) 

Jo 
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Hence, 

( 8 ) 

and in general 

(9) 


2* 


B,it) = N 


r( jfc) 


Summing the contnbutions of the several generations, we have for the total 
renewal function 


( 10 ) 




If A is a positive integer > 3, (10) can be easily summed to a form which 
shows immediately its damped periodic nature. Even if k is positive but not 
an integer, it can be shown by continuity considerations that the function B{t) 
defined by (10) has periodic properties. 

Assuming to be a positive integer, then, and setting 2 = ct, we may write 
the expression in brackets in (10) as 


A-i 


Jk~l 


( 11 ) 

Then 


+ 


(fc- 1)! ' (2fc- l)I 
d*/(z) 


+ ... = f(z). 


dz'‘ 




and upon making the trial substitution, f{z) = Ae”", we get 
Hence, 

m*' = 1. 

Taking unity in its complex form 

1 = cos 2nir + i sin 27i7r 

we have that 

( 12 ) 

where n = 0, 1, 2, 


A: . • t 

Win = V 1 = COS -r- + I sm 

rC 

, fc — 1, Then 

/(2) = 2 ^nC”"' 


2n7r 


fc—1 

f(z) = E^nWl’ne’^*. 

n—0 


and 
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Now setting z = 0, we get 

/(O) = ylo + . -f- An-i = 0 

/'(O) = Anino + Ainii Ak-imt-i = 0 


J* ^( 0 ) ~ Aatrio ^ + Airrii ’ + ••• + Ak-iml-i = 1 

k equations to determine the k constants. We know that An is equal to the 
ratio of two determinants formed from the coefficients of the above equations. 
This ratio reduces to 

(m*_i — — mn)“- (irin — mo) ‘ 

We have, then, an expression for the fe constants in terms of the k roots of unity. 
Therefore, for any particular value of k we can obtain the sum of our series 
from the relation 

/(Z) =§^n6’-*. 

Hence, under the assumption that A; is a positive integer, we have 
(14) Bit) = Nce~‘* Z Ane”"“. 

^ n *»0 

The forms of B(£) for k = 1, 2, 3, 4 are respectively 
Bit) = Nc 

Bit) = JATcd - e"“‘) 

Bit) = cos iV^ct + ^ sin iV3c(^j 


Bit) = Nce~’“[W‘ - - fsinct]. 

Although the above procedure is valuable particularly because it brings to 
light something of the nature of our renewal function, the forms denied above 
can be used actually to obtain values of Bit) for various values of t. However, 
for extensive numerical work a better method is at hand, which does not even 
depend on the assumption of an integral value for k. . 4 . 1 , » i 

Let us return once again to equation (10) which may be written m the fol¬ 
lowing form 


Bit) = Nc 


-“ict)*'" 

m 
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If k and c are determined by the method of mpments, (using two moments), 
k will not, in general, be a positive integer. However, by using the Tables of 
the Incomplete Gamma Function edited by Karl Pearson, one can compute values 
of B{t) without much difficulty. In these tables the function I (w, p) is tabulated 
for various values of u and p, where J(m, p) is defined by 


(16) 


Iiu,p) = 


i 


v'/p+l 


e '‘F'dv 


r(p +1) ■ 

If we let £ = Uiy/p + 1 = uo\/p then upon integrating by parts we find 

e“^£’’ 


(17) 


Tip + 1 ) 


= J(mo, p - 1) - /(mi, p). 


The left hand member of this equation is of the same form as each of the terms 
of the series in brackets in (15). Hence, the value of the renewal function for a 
particular time, t, is directly obtainable by summation of the right hand member 
of (17) for successive significant values of the argument p. 

By way of illustration a numerical example will be considered. The data are 
taken from E. B, Kurtz’ book entitled Life Expectancy of Physical Property. 
In this book the author makes a study of retirement rates of fifty-two different 
types of physical property, and finds that their replacement curves fall into seven 
distinct groups. We consider here Group VII which happens to be the largest 
group, embracing seventeen different types of industrial equipment out of the 
fifty-two examined. Using Kurtz’ replacement data * we obtain for the value 
of the first and second moments 


m = 10.002 
M, = 121.71 

and from these by the method of moments, we find 

k = 4.62 
c = .462. 

We then proceed to calculate values of B{t)/N by means of Pearson’s Tables,' ob¬ 
taining the results shown in the following table. 


• E. B. Kurtz, it/e Expectancy of Physical Properly, Ronald PresB, 1930, Table 22, page 86. 

’ With regard to the method of interpolation employed in the calculations, it should 
be mentioned that it was found advisable to use the Mid-panel Central Difference Formula 
(zxtti) on page xii of the introduction to Pearson’s Tables; and that it is quite sufficient 
for our purposes to calculate only first order terms. 
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1 

BiD/N 

t 

B{t)/N 

0 

.0000 

10 

.1049 

1 

.0016 

11 

.1043 

2 

.0103 

12 

.1028 

3 

.0279 

13 

.1006 

4 

.0486 

14 

.0990 

6 

.0714 

15 

.0994 

6 

.0867 

16 

.1009 

7 

.0980 

17 

.1013 

8 

.1039 

18 

.0992 

9 

.1066 

19 

.0999 



20 

.0993 


In conclusion the author wishes to thank Professor S. S. Wilks for various 
suggestions he has made in connection with this note. 

Phinckton Univebbity, 

Peincbton, N. J. 


ESTIMATES OF PARAMETERS BY MEANS OF LEAST SQUARES 

By Evan Johnson, Jb, 

As a criterion for comparing estimates of a parameter of a universe, of known 
type of distribution, the use of the principle of least squares is suggested. A 
criterion may be stated in rather general terms. Its application to any given 
problem presumes a knowledge of the distribution functions of the estimate 
considered, In the present paper a criterion is set up and application of it is 
made in the estimation of the mean and of the square of standard deviation of a 

normal universe. . j t+ ■ 

We shall use the symbol e to represent a parameter to be estimated. It is 
to be remembered that 0 is a constant throughout any problem, that it represents 
an unknown value, and that observations and functions of observations (called 
estimates) are the only variables that occur. We shall use the symbols - 
1,2, ■ • ■ , n, to represent observed values of the variable x of the universe, and 
the symbol F to represent a given function of the observations Xi. 

If we choose to consider a given function F as an estimate of 6, we are then 
interested in the error F - e. This quantity differs from the s^called residual 
of least square theory, since we are here interested in the difference between 
computed and true values, rather than in the difference between observed and 
computed values. To avoid any possible confusion we shall reier ™ ^ - 
as the error. Over the set of all samples of n observations, x ,, the distnbu^tion 
of the errors F - 0 is expressed by means of the.distnbution function /( ), 



454 


EVAN JOHNSON, JB. 


which may bo computed from the known distribution function of the universe. 

We shall assume that the function/(F) has been normalized, so that | f(F) dF = 

1, where the interval from a to ^ includes all possible values of F. The integral 

I — (F — 6ff{F) dF, associated with a given estimate F, may be thought 

of as the average square error over the set of all samples. 

In this notation we shall state a criterion for the judgment of estimates in 
either of the two following forms; 

Definition 1. Lei fi be the distribution function of Fi, and ft that of Fj . 
The estimate Fi of 6 will be judged better than the estimate F^ if 


J (x — d)%(x) dx < j (x — d)%(x) dx. 


Definition 2. From a given class of functions, of which F is a member, F will 
be called the best estimate if 


( 1 ) 


Z = 1\f- dff{F) dF 

a 


IS less than the corresponding integral for all other functions of the class. 

It is to be observed that the integral 7 is a function of the quantities 0 and f. 
From this is seen at once the distinction between the present problem of mini¬ 
mizing the average square error and the similar problem of finding that point 
around which the mean square value of the deviations of a variable is a TTmniTimnm 
In the problem under consideration we wish to find the function F, or more 
precisely its distribution function /(F), for which I takes its minimum with a 
fixed value of d. In the alternative problem we have a given distribution / 
and we wish to find the minimum of I with respect to 6. 

A second observation to be made is that the integral I can not be usefully 
minimized in the sense of the general conditions of the calculus of variations. 
The problem would be of the isoperimetric variety, with the side condition 

/ f{x) dx = 1. A solution might be expressed as the limit, as a approaches zero, 

•'a j 

of functions f{x) with proper continuity conditions, such that 


/(») 


= 0 when | a: — 6 | ^ o, 

/ P+o 

f{x) dx = 1. 


Such a solution ■vyould be meaningless in practical statistical theory. Solutions 
are to be expected, therefore, only iil those cases where the class of functions, 
from which F is to be selected, is sufiiciently restricted. 

The two following examples illustrate both restrictions and possible applica¬ 
tion of the theory. 
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Ae a first example let u.s eonsider the problem of finding an estimate F of the 
mean, £, of a normal universe. The mean of a distribution is a symmetric 
linear function of the variates of the distribution, For the class of functions 
from which to select an e.stirnato F of x, let us take the class of all symmetric 
homogeneous linear functions of the observations x,. Let 

( 2 ) I''= + ... +x„). 

We wish to find the value, of a, if any, for which 7 is a minimum. 

F IS the sum of n normally distributed independent variables, ax ,, each with 
standard deviation atr. /<’, therefore, has a distribution function 


f = C- exp 


/ — (F - anxf \ 
\ 2a^ntr^ /’ 


where C is so 


chosen that 



1 . 


A discussion of general distribution func¬ 


tions may be found in Dunliam Jackson’s article, “Theory of Small Samples,’’ 
in the American Malhemalical Monthly, Volume XLII, 1935. In this ease it 
can be shown without particular difficulty that 


= (in<^ -h x^(an — 1)*. 


To determine the minimum of 7 with re.spect to a, we set 
^ = 2 an<r^ -b 2x*(on — \)n = 0, 


and obtain 


(3) 


_ 1 1 
ni“ -b .r* ^ 1 + 



It is seen that for even such a simple example as the estimation of the mean 
there is no CRtimalo of the form of equation (2), with a independent of the param¬ 
eter to be estimated, for which 1 takes its minimum value. 

For a distribution in which $ 0, and a is small, a is given as a first 

approximation by 1 /n. The function F is merely the mean of the sample obser¬ 
vations. If £ = 0, the required solution is o = 0, and there is no best least 
square estimate of the, type of equation (2). 

In the case where is not small, as is apt to be the case when x is near 
zero, the determination of a desirable estimate by least squares requires a knowl¬ 
edge of the ratio which may perhaps be judged approximately in a special 
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problem. If this value is assumed known, the required value of a may be found 
most easily by rewriting equation (3) in the form 


(4) 


1 

n + ' 


The second example to be considered is the determination of an estimate of 
of a normal universe. A comparison with the definition of o-“ suggests the 
use of a function F given by the equation 

(5) F = a [ {xi — x)^ -f- ( 2:2 — + • • + (xn — }, 


where £ is the mean of the n observations. The value of a is, of course, to be 
determined by minimizing the integral I 
F is the sum of the squares of n normally distributed but not independent 
variables. It may be shown, however, (Jackson, loc. at.) to be expressible as 
the sum of the squares of n—1 independent normally distributed variables, each 
with standard deviation -s/ac. The distribution function for F takes the form 

(6) f{F) = C 


F taking only positive values, and C is again chosen to normalize f{F). 
integral I may be written 


I = C f (F- 
Jo 


dF, 


The 


The integration is most easily accomplished by replacing F by w*, and in terms 
of u 

1 = C r (u^ - du. 

Jo 

The various steps in the integration will differ for even and odd values of n, 
but in each case the final result is the same It is found that 

(7) 7 = (T* ( — 1) — 2a(n — 1) + 1 }. 

The value of a which minimizes I is determined from the relation 


^ = <r‘ {2a(n* — 1) — 2{n — 1)} =0. 

Dividing by (n—1), which is not zero in a sample of two or more observations, 
we obtain 




1 

w + r 


In contrast to the previous example we have here an absolute minimum of 7 
with respect to all estimates of the type of equation (5). The best least square 
estimate of this type is, therefore, 


p _ (a;i - x)^ + jxi - xY + ... + (x„ - xY 
n+1 


(9) 
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THE TEACHING OF STATISTICS^ 

By Harold Hotellii^g 

The very great increase in the teaching of statistics since the First World 
War has been associated on one hand with the development of statistical theory. 
This important series of discoveries has made available more and more power¬ 
ful and accurate statistical methods, and has also acquired an mtellectual 
interest of its own as embodying the modern version of the most important 
part of inductive logic and as providing scope for mathematical and logical 
ingenuity of high order. The increased teaching of statistics has also been 
associated with the rapidly growing applications of statistics in innumerable 
fields, made possible by the development of the theory, by the availability of 
persons having some knowledge of the theory, and by an increasing realization 
of the possibilities of application. Doubtless most students of statistics enter 
upon the subject, not for its intrinsic interest, but with the idea of applying 
statistical methods as a tool to some particular end. This object may be 
scientific research, or to fulfill a requirement for a degree, but is often connected 
with some purely practical pursuit offering the ready prospect of a remunerative 
job. But it would be a mistake to ignore those whose interest is more purely 
intellectual, who desire an insight into the pecuhar problems of probable in¬ 
ference and the structure of empirical knowledge, who wish to get a fundamental 
acquaintance with one of the most fundamental of subjects, to see and under¬ 
stand fully the mathematical derivations underlying so much practical and 
scientific activity, and perhaps to make their own contributions. 

Of the magnitude of the demand for statisticians there can be no doubt. 
The realization of what statistical methods can do in a multitude of fields has 
gradually led the administrators of government agencies, directors of scientific 
organizations and research institutes, and business men, to employ rapidly 
increasing numbers of persons with some knowledge of statistical methods, and 
to accord an unusual degree of recognition and promotion in many such cases. 
The uses of statistical methods, and especially of sampling theory, are so varied 
that it is scarcely possible in a brief space to give any sort of survey of them. 
They enter, in one form or another, into the research work of the physicist, the 
chemist, the astronomer, the biologist, the psychologist, the anthropologist, 
the medical investigator, the economist, and the sociologist. Meteorology, 
which has lately acquired greatly increased importance, both civil and military, 
is with its masses of numerical observations very much a statistical matter. 
The engineer needs modern statistical methods both in the physical and in the 

• Address at the meeting of the Institute of Mathematical Statistics at'Hanover, N. H , 
September 10, 1940. 
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economic aspects of his plans. The work of W. A. Shewhart has made clear 
the central importance of sampling theory in the economic control of quality 
of manufactured articles. Business men who use sampling surveys to test 
the markets for their products and the effectiveness of their advertising, who 
employ statisticians to make up index numbers and forecasts of business condi¬ 
tions, and whose manufacturing costs and quality are controlled with the 
help of recently devised statistical methods, are finding more and more uses for 
statisticians. Indeed, it seems as if the exploitation of the business and manu¬ 
facturing possibilities of statistical methods has only begun, and that limitless 
further fields are coming into view. Insurance has of course always been essen¬ 
tially dependent on statistics. 

But the most rapidly growing large class of positions for statisticians is at 
present in governmental activities. For some facta regarding the employment 
of statisticians by the federal government I am indebted to Dr. J. M Thomp¬ 
son. It appears that it has about one hundred agencies using statistics, with 
almost eight hundred positions broadly classified as statistical or mathematical, 
in addition to more than six thousand generally classified as economists. The 
title “economist” covers many types of work, but much of it is largely statis¬ 
tical. The nature of the government’s statistical work is varied and extensive. 
It includes such work as forecasting revenue from taxes, prices and production 
of agricultural commodities, general demand conditions, and weather. Some 
of the work consists in analyzing the effects of various taxes on other programs. 
In connection with proposed legislation, statisticians serving the lawmakers 
often attempt to outline the probable results of the legislation, as well as to 
assist in setting up definite formulae for carrying out the general policies aimed 
at in Acts of Congress. Administrators as well as lawmakers require statistical 
activities of a high order, exemplified in the Bureau of the Census, the Bureau 
of Agricultural Economics, and others. The scientific activities of the govern¬ 
ment, the work of the War Department, and many others that do not at first 
sight appear at all statistical, require the services of mathematical statisticians 
of high order. Even the judicial activities call for statistical theory of some 
of the most recently discovered kinds, as for instance in the investigation re¬ 
cently made of parole procedures. Cities and states, school and port authori¬ 
ties, employ numerous statisticians for other and widely diverse purposes. 

The growing need, demand and opportunity have confronted the educational 
system of the country with a series of problems regarding the teaching of statis¬ 
tics. Should statistics be taught in the department of agriculture, anthro¬ 
pology, astronomy, biology, business, economics, education, engineering, 
medicine, physics, political science, psychology, or sociology, or in all these 
departments? Should its teaching be entrusted to the department of mathe¬ 
matics, or to a separate department of statistics, and in either of these cases 
should other departments be prohibited from offering duplicating courses in 
statistics, as they are often inclined to do'^ To what students, and at what 
stage of their advancement, should a course in statistics be administered? 
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Should there be mathematical or other prerequisites? How much of an in¬ 
vestment in a statistical laboratory is warranted? Should courses be primarily 
theoretical and mathematical, or should they be made as practical as possible 
equipping the student in the shortest possible time for a job as statistician or 
for statistical work in the field- with which a particular department is con¬ 
cerned? What about degrees in statistics? Eclipsing all these m importance, 
though it seems to have received too little of the attention of college and uni¬ 
versity administrative officers is the question, What sort of persons should be 
appointed to teach statistics? 

To pressing practical problems answers are sure to be given either by con¬ 
sidered policy or by processes of historical evolution. The latter are the more 
prominent in explaining the statistical teaching we have had A synoptic 
picture of the origins, not many decades ago, of a good deal of it would perhaps 
be something like this. A university Department of X, where X stands for 
economics, psychology, or any one of numerous other fields, begins to note 
toward the end of the pre-statistical era that some of the outstanding work 
in its field involves statistics. The quantity and importance of such work are 
observed to increase, while at the same time its intelligibility seems to Himinisfi 
Evidently students turned out with degrees in the field of X who do not know 
something about statistics are going to be handicapped, and are not likely to 
reflect credit on Alma Mater. The department therefore resolves that its 
students must acquire at least an elementary knowledge of the fundamentals 
of statistics To implement this principle, it perhaps inserts some acquaint¬ 
ance with statistics among the requirements for a degree This situation 
naturally calls for the introduction of a course in statistics Accordingly the 
head of the Department of X, in preparing the next Announcement of Courses, 
writes. 

“X 82. Elements of Statistics An elementary but thorough 
course designed to acquaint students of X with the fundamental con¬ 
cepts of statistics and their applications in the field of X. The view¬ 
point will be practical throughout. Second semester, MWF at 10. 

“Instructor to be announced.” 

The problem now arises of finding someone to teach the new course. The 
few well-known statisticians in the country have positions elsewhere from which 
it would be impossible to dislodge them with the bait to be offered; for though 
the department wishes to have statistics taught as an auxiliary to the study of 
X, it feels that there must be no question of the tail wagging the dog, and that 
economy is appropriate in this connection. The members of the department 
of professorial rank do not respond favorably to the suggestion that they should 
themselves undertake to teach the new and unfamiliar course. But every 
university department has a bright graduate student whose placement is an 
immediate problem. Young Jones has already demonstrated a quantitative turn 
of mind in the course on Money and Banking, or in the Ph.D. thesis on which 
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he has already made substantial progress, dealing with The Proportion of 
Public School Yard Areas Surfaced with Gravel. He may even recall having 
had a high-school course in trigonometry. His personality is all that might 
be desired. He is a white, Protestant, native-born Amencan. And so, the 
"Instructor to be announced" materializes as Jones. 

This earnest young scholar now finds that, in addition to completing his 
thesis, he must look up the literature of statistics and prepare a course in the 
subject. His attention is directed by older members of the department to 
some of the research papers in the field of X involving statistics. He pursues 
"statistics” through the library card catalog and the encyclopedias. He reads 
about census and vital statistics, price statistics, statistical mechanics. Per¬ 
haps he encounters probable errors. Eventually he learns that Karl Pearson 
is the great man of statistics, and that Biometrika is the central source of infor¬ 
mation. Unfortunately most of the papers in Biometrika and of Pearson’s 
writings, while not lacking in vigor, trail off into mathematical discourse of a 
kind with which young Jones feels ill at ease. What he wants is a textbook, 
couched in simple language and omitting all mathematics, to make the subject 
clear to a beginner. Perhaps he finds the impressive books of Yule and Bowley, 
but decides that they are too abstruse Elderton's “Frequency Curves and 
Correlation” is far too mathematical. Jones decides that a simple book on 
statistics must be written, and that he will do it if he can ever succeed in master¬ 
ing the subject. In the meantime, he contents himself perforce with the less 
mathematical writings of Karl Pearson, with applied examples in the field of X, 
and with such nonmathcm.atical textbooks as may have been written by other 
young men who have earlier trod the same path as that on which Jones is now 
beginning. Somehow or other he gets the class through the course. After 
doing this two or three times, Jones is an experienced teacher of statistics, and 
his services are much m demand. His course expands, takes on a settled form, 
and after a while crystallizes into a textbook. At the same time he may be 
getting out some research, consisting of studies in the field of X in which statis¬ 
tical methods play a part. His promotion is rapid. He becomes a Professor 
of Statistics, and perhaps an officer in a national association. His textbook 
has a large sale, and is used as a source by other young men writing textbooks 
on statistics. 

The textbooks written in this way form an interesting literary cycle. Meas¬ 
ures of “central tendency” and of dispersion are introduced, and the use of 
one as against another of these measures is debated on every ground except 
the criterion that modern research has shown to be the important one, the 
sampling stability. Sampling considerations, indeed, get little attention. 
The urge to simplify by leaving out the more difficult parts of the subject, and 
especially the mathematical parts, is accompanied by pride in the great number 
of examples drawn from real life, that is, actual data that have been collected. 

But the most fascinating feature of this literary cycle is the opportunity it 
offers for research by the standard methods of literary investigation, tracing the 
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influence of one author upon another through parallelism nf rvQcc„ j 
forth. This study is facilitsted by the ^LltiroTerlrXeSe” 
copying. One outstanding example is in certain formulae connected with the 
rank correlation coefficient, derived originally by Karl Pearson in 1907 and 
copied from textbook to textbook without adequate checking back As one 
error after another was introduced in this process, the formulae presented to 
students (and apparently made the basis of class exercises involving numerical 
substitution) became less and less like Pearson’s original equations Inci 
dentally, in trying to check this original work of Pearson’s, recent investigation 
has raised the suspicion that it is erroneous; at any rate, he does not give a fully 
adequate argument. Thus it may be that the errors in copying, which are so 
useful in examining the history of statistics, never did any harm. The formulae 
in which the students were drilled may have been no worse than they would 
have been if all the copying had been done with more care. 

While this process has been going on in the Department of X, the Y and Z 
Departments have likewise evolved the teaching of statistics. There is some 
interchange of ideas between the various statisticians on the campus and there 
is a catholicity in the copying of textbooks. But by and large, statistics is 
regarded in the Economics Department as a branch of economics, in the Psy¬ 
chology Department as a part of psychology, and so forth. The astronomer is 
inclined to resent the suggestion that his students should be called upon to study 
their least squares with anyone but an astronomer. Medical and biological 
investigators suspect Economics and Psychology of charlatanry, and do not 
look with favor on the idea of turning their own students over to such depart¬ 
ments for instruction in statistics. Most Unthinkable of all would be putting 
the Department of Education in charge of an essential part of the training of 
scientific students. Thus the courses multiply. 

The fact that it is essentially the same fundamental subject that is being 
taught under various names and with various kinds of notation in different 
departments is often concealed by including the teaching of statistical theory 
in a course whose title and prospectus are more suggestive of applications. A 


case in point is that of an economist of my acquaintance, not primarily engaged 
in teaching, who some years ago was invited to give a course in Price Forecasting 
in the Economics Department of a leading university. He carefully prepared a 
series of lectures on this subject, which had been the center of some extended 
research he had conducted. A large class enrolled for the course. But soon 


after beginning his series of lectures the economist noticed that the class was 
growing restive. Upon inquiring what was amiss, he learned that his discourse 
was unintelligible to many of them because he was using technical statistical 
terms and concepts with which they were not familiar. He thereupon under¬ 
took to use simpler language, and when this did not suffice to convey his mean¬ 
ing, to explain the statistical notions involved in his work on price forecasting. 
More and more his lectures came to deal with the elements of statistics, and less 
and less with price forecasting. At the end of the term he felt that he had 
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given the students some elementaly knowledge of statistical theory, for which 
they had not enrolled and for which he did not feel particularly well qualified, 
but had taught them virtually nothing about price forecasting. When the 
invitation was repeated the next year, the economist suggested imposing a course 
in statistics as a prerequisite for the course in Price Forecasting. Thishowever 
was vetoed by the head of the Economics Department, who did not believe in 
prerequisites. The Price Forecasting course was not repeated. 

This incident illustrates the evolution of a good deal of statistical teaching. 
At the beginning, the idea is to teach some application, but the teacher soon 
finds himself engaged at much more length than expected with the fundamentals 
of statistical theory and methods In this way it has come about that a large 
number of persons are teaching theoretical statistics who initially had no inten¬ 
tion of doing so, but were concerned with particular applications. The teach¬ 
ing of statistical theory has been undertaken belatedly and inexpertly because 
it was necessary to a discussion of some application originally in view. Thus 
it happens that a good deal of teacliing of statistics, even of mathematical 
statistics, masquerades as something else. 

The obvious inefficiency of overlapping and duplicating courses given inde¬ 
pendently in numerous departments by persons who are not really specialists 
in the subject leads to the suggestion that the whole matter be taken over by the 
Department of Mathematics, This is a promising solution, but it is doomed to 
failure if, as has sometimes happened, it means that the teaching of statistics 
is put under the jurisdiction of those who have no real interest in it. Moreover 
the teaching of statistics cannot be done appreciably better by mathematicians 
Ignorant of the subject than by psychologists or agricultural experimenters 
ignorant of the subject. The latter indeed have a certain advantage in that the 
problems seem more real and definite to them; they can sense the difference 
between the important and the unimportant questions, even if they cannot 
express the questions in clear mathematical language, and can sometimes arrive 
intuitively at a correct result that leaves the mathematician puzzled. Also, 
they can understand more readily than can the mathematician the examples, 
drawn largely from biological material, which play so important a part in some 
of the leading expository work on statistics, such as R, A. Fisher's Statistical 
Methods for Research Workers, The pure mathematician has only one advan¬ 
tage over the non-mathematical worker in empirical fields: he is able to set about 
reading the serious literature of statistical theory. But he must still find this 
scattered literature, sort it out from a mass of rubbish, fallacies, and false starts, 
and trace it back historically until he can understand the notation and the pre¬ 
suppositions He must also contend with the fact that a good deal that is im¬ 
portant in statistics is still a matter of oral tradition, and some consists of lab¬ 
oratory techniques. In short, he needs a teacher before he himself sets out to 
teach the subject. When a Department of Mathematics calls in a young Ph.D., 
however brilliant, to teach statistics as a part or all of his program, the best 
thing it can do, if he has not already had a training in modern statistics, is to 
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give him a furlough for a year or two to enable him to go where he can acquire 
such a training. 

Qualifications of a good teacher of statistics include, first and foremost, a 
thorough knowledge of the subject. This statement seems tnvial, but it has 
been ignored in such a way as to bnng about the present unfortunate situation. 
Mathematicians and others, who deplore the tendency of Schools of Education 
to turn loose on the world teachers who have not speciahzed in the subjects they 
are to teach, would do well to consider their own tendency to entrust the teach¬ 
ing of statistics to persons who not only have not specialized in the subject, 
but have no sound knowledge of it whatever. A knowledge of theoretical 
statistics is not easy to obtain. There is no comprehensive treatise on the sub¬ 
ject, starting from first principles, and proceeding by sound deductions and 
well-chosen definitions to the methods that need to be used in practice. (I 
have been trying for years to write such a treatise, but it has turned out to be a 
bigger task than at first appeared. This is partly because some things formerly 
thought to have been proved turn out, on critical examination, not to be sound, 
and much new research has been necessary.) The literature is scattered through 
journals pertaining primarily to many kinds of applications, and it is only m 
recent years that any large proportion of the current contributions to statistical 
theory and methods have been gathered into a few penodicals devoted to sta¬ 
tistical theory. On the other hand, the seeker after truth regarding statistical 
theory must make his way through or around an enormous amount of trash 
and downright error. The great accumulation of published writings on statis¬ 
tical theory and methods by authors who have not sufficiently studied the sub- 
jeot is even more dangerous than the classroom teaching by the same people. 

A good teacher of statistics needs of course a mathematical background, in¬ 
cluding at least an acquaintance with the theory of functions and n-dimensional 
euclidean geometry. A good deal of additional algebra and 
to be helpful, as well as some differential geometry. But no 
mathematics constitutes by itself any aPPro^ch to sufficiency m the quahfi^ 
tions of a teacher of statistics. The most essential thing is that the man sMl 
know the theory of statistics itself thoroughly from the ground up including 

to apply them m various empirical fields. * nr teacher of 

and to knowledge of .taWioal 

slatWoB needs a really intinrate i„. 

empirical subjeote in which statical methods ^LTlled 

portant. Sometimes excellent mathematicians a- that is necessary for 

LdenlS throngh failure to get tot feeling for appheations tot i. nccemary 
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the term ‘standard deviation’?”, which must be faced by every teacher of 
Statistics 1, requires for an intelligent answer a rather thorough understanding 
of modern sampling theory and techniques. The answer, it now seems, is 
not the definition given in most textbooks. In the selection of a statistic to 
represent a parameter, for example in fitting frequency curves or in linkage 
estimation in genetics, the fundamental consideration is connected with the 
sampling distribution, as E. A. Fisher showed in founding the modern theory of 
estimation. This is ignored in most of the current teaching of statistics, with 
the result that innumerable students are sent out to waste the money and time 
of their employers by demanding larger samples than are necessary for the pur¬ 
poses in view, wasting costly information by calculating inefificient statistics 
and using tests that are not the most powerful. On the other hand, students of 
statistics who are taught rule-of-thumb methods without their derivations are 
never quite conscious of the exact limitations and assumptions involved, and 
may make unwarranted inferences from samples that are too small or in some 
way violate the conditions underlying the derivations of the formulae. 

A good teacher of statistics must be'thoroughly familiar with these recent 
advances. He must examine very critically textbook statements unsupported 
by full proofs. Even though the students are not capable of following the 
complete mathematical argument—mdeed, especially if the students are not to 
examine it—^the instructor needs to give it a critical study. The custom of 
omitting proofs, which would not be tolerated in pure mathematics beyond 
a very limited extent, is common in the teaching of statistics, and is excused on 
the ground that the students do not know enough mathematics to understand 
the proofs. Perhaps in some cases a better reason is that the teachers, and the 
authors of the textbooks, do not understand the proofs. In some instances 
no proofs exist, and in some instances no genuine proofs can exist, because the 
methods taught are demonstrably wrong. The custom prevalent in the teach¬ 
ing of mathematics of going over each proof carefully in the class is, among other 
things, a safeguard against infiltration of false propositions. This safeguard is 
missing from most of the teaching of statistics, and there has been an infiltration 
of errors. Since it is accepted that a great many students need to learn some¬ 
thing about statistical methods without learning enough mathematics to under¬ 
stand the proofs, it follows that the elementary teaching of statistics to these 
students must, if the perpetuation of gross errors is to be avoided, be in the 
hands, of really competent mathematical statisticians. This is perhaps the 
greatest reform needed in the teaching of statistics today. Until the elementary 
teaching of statistics is conducted by those with a thorough and critical knowl¬ 
edge of current research in statistical theory, of a sort that seems virtually 
inseparable from participation in that research, there is likely to be a continua¬ 
tion of the laborious drilling of thousands of students in methods that ought 
never to be used. Here, of all places, is the great need for participation of 
research workers in elementary teaching. 

Teachers and textbook writers might well abandon the idea of telling what 
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statistical methods are used, and say instead what methods ought to be used. 
But before they can do this with confidence they must have a very close ac¬ 
quaintance with the research of the last three decades in statistical theory. 

How can an appointing oflicer know whether a prospective teacher of statistics 
knows his subject? This question requires no answer peculiar to statistics in 
distinction from other subjects. Publication of research, constituting a contri¬ 
bution to the particular field, has always been accepted as the best proof, A 
substantial contribution to fundamental statistical theory, which is to be dis¬ 
tinguished from the mere application of known statistical methods to empirical 
data, is the best indication of the kind of scholarship appropriate to a teacher of 


statistics. 

Participation in research is not novel as a criterion of what constitutes a good 
teacher of a college or university subject, if the subject is Greek hterature, 
physics, chemistry, biology, or indeed any of those departments that have been 
long enough estabhshed to attain with respect to the organization of their teach¬ 
ing a state approximating equilibrium. The more reputable institutions of 
higher learning have long maintained the principle, though with occasional 
violations in practice, that the Ph.D: degree or its equivalent, representing among 
other things the completion of a piece of scholarly research, is a minimum 
condition for a regular faculty appointment. It has usually been maintained 
also that the Ph.D. thesis should be a new contribution of a strictly scholarly 
character to the field of the scholar’s competence, and not merely a routine 
application of known methods to an extraneous field. Thus a thesis offered for 
the Ph.D. degree in mathematics would be judged by its contribution to mathe¬ 
matics, rather than to physics or accounting. Moreover the regard in which 
universities have held members of their faculties has been intimately connected 
with their output of scholarly research. Other criteria of excelknce have not 
been ignored, but research has been recognized in a fairly consistent maimer. 
Some say that there has been an over-emphasis on research, and that more at¬ 
tention ought to be given to other qualities related to teaching. However 
this mav be the facts remain that scholarly research is somethmg capable 
S a ZoX objective evaluation by scholars in the field that it offers he 
main hope of fundamental progress, and that familianty vnth 
is a necessary, though not sufficient, condition for the most important teaching 

in institutions of higher learning. _ ^ theorv of 
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particular applications of the fundamental sciences. Moreover the engineer 
might in the course of such teaching refresh his own knowledge of elementary 
mathematics, while the physician might gain by renewing his acquaintance with 
elementary biobgy. Such arrangements might occasionally be made with 
profit. But if they were the general rule the advantages of specialization would 
be lost; the fundamental sciences would not be developed in so well-rounded a 
manner as they are by specialists in them, while the special skills and knowledge 
of the physician and engineer could not be utilized to the full in their respective 
professions. Statistical theory is a big enough thing m itself to absorb the full¬ 
time attention of a specialist teaching it, without his going out into applications 
too freely. Some attention to applications is indeed valuable, and perhaps 
even indispensable as a stage in the training of a teacher of statistics and as a 
continuing interest. But particular applications should not dominate the 
teaching of the fundamental science, any more than particular diseases should 
dominate the teaching of anatomy and bacteriology to pre-medical students. 
These subjects are not ordinarily taught by practicing physicians, but by anat¬ 
omists and bacteriologists respectively. 

In medical education the principle has been accepted, after a long struggle, 
that a medical school should have full-time professors engaged primarily in 
teaching and research, and that such professors should not treat patients except 
in cases of unusual interest from the standpoint of the science or art of medicme. 
An analogous principle would be that an institution offering extensive instruc¬ 
tion in statistics should have full-time professors engaged in the teaching of and 
research in statistical theory and methods, without spending time over applied 
statistical problems exceptmg insofar as such problems might present novel 
features calling for the development of new statistical methods or theoretical 
extensions having interest going beyond the immediate case. Sometimes the 
complaint is heard ip medical schools that the teaching tends to become too 
theoretical on account of detachment from chnical practice, and a similar diffi¬ 
culty might conceivably develop in connection with statistics; but in neither 
case does the trouble seem to be beyond the ability of the personnel involved to 
cure if they have the right background. 

A specialist in statistics on a university faculty has a'threefold function. In 
addition to the usual duties of teaching and research, there is a need for him to 
advise his colleagues, and other research workers, regarding the statistical 
methods appropriate to their various investigations. The advisory function is 
a highly important one for the activities of the university as a whole, and should 
be taken into consideration in adjusting the teaching load. Probably every 
umversity statistician is visited from time to time by earnest research workers, 
deeply engrossed in their respective specialities, speaking technical jargons un¬ 
familiar to the statistician, and seeking his advice on matters concerning which 
he has a sinking feeling of lack of comprehension After some hours of psycho¬ 
analyzing his visitor the statistician may be able to ascertain what it is he really 
wants to know, and thereafter either refer him to some standard formula, or 
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more ofteiij undertake a piece of new mathematical research designed to fit 
the particular problem, and very possibly havmg value also for a more extended 
class of problems. The statistician is then very Ukely to find himself embarked 
on a co-operative research venture in a field that is new to him. 

To function well in this third, the consultative or co-operative function, he 
must have an unusually large store of general information. No one stands in 
greater need than he of that knowledge of “something about everything and 
everything about something" that was once said to be the goal of a liberal 
education, In planning the education of statisticians and teachers of statistics 
these considerations point to a somewhat wider diffusion of studies among vari¬ 
ous fields than is customary ip many institutions, especially in graduate work. 
The co-operation, and their other work, would also be facilitated if research 
workers in general were more strongly urged to get a training in mathematical 
statistics at an early stage in their careers. 

The problem of departmental organization is secondary to that of getting men 
having the requisite qualities pf extensive mathematical preparation, a thorough 
knowledge of modern theoretical statistics, an understanding of some fields at 
least in which statistical methods can be applied, and the type of inquiring 
mind sometimes described as a “research outlook." A Department of Mathe¬ 
matics may well handle the fundamental teaching in statistics, provided it has 
men properly qualified for such teaching. If it does not have such men, its 
teaching of statistics and its inability to provide the needed statistical advice 
will inevitably tempt the other departments to set up again their own duplicat¬ 
ing courses in what amounts essentially to statistical theory and methods, and 
to repeat the mistakes of the past. 

A separate Department of Statistics, if competently staffed, could very well 
provide advice for the whole institution as well as conducting elementary in¬ 
struction in statistical methods and theory, both for students having calculus 
and for those without it, and should certainly carry on advanced teaching and 
research in statistical theory and methods. But for efi&cient functioning of the 
institution as a whole it should be agreed that the Department of Statistics or 
the Department of Mathematics should do all the elementary instruction in 
statistics, and that courses in statistics in other departments should be confined 
to applications of the basic theory. Normally such courses in applied statistics 
in the other departments should require as a prerequisite one or more of the basic 
courses in the Department of Statistics, or of Mathematics. The basic course 
to be required as a prerequisite to others should be the one which itself requires 
calculus as a prerequisite wherever this is practicable. It is practicable for 
students of engineering, physics, astronomy, and mathematical economics, since 
these students must have calculus anyhow. Moreover the value of the se¬ 
quence consisting of calculus, statistical theory and applied statistics, in to 
order, is so great that many other students are likely to avail themselves of it 
when it is once established and the true nature and value of statistics are more 
widely understood. 
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Exactly how far a Department of Statistics should go in particular applica¬ 
tions would have to be decided anew from time to time by its members in the 
light of changing conditions and interests. It cannot teach everything that goes 
by the name of statistics. This problem may be exemplified by the case of 
population and vital statistics. This is a field with close connections with so¬ 
ciology, biology, medicine and insurance. It is cultivated in conjunction with 
each of these subjects in various places. Some of its most interesting and im¬ 
portant phases make use of quite advanced mathematics, as in the work of 
A. J. Lotka, and in addition there is extensive use, and more extensive need, of 
the statistical methods centered around sampling theory which are the appro¬ 
priate domain of a Department of Statistics. Shquld the study of population 
and vital statistics be mcluded in a Department of Statistics? I think notj 
except as a temporary arrangement, or in a small institution, in spite of the 
history of the word "statistics,” which originated in connection with material 
of this kind, and in one of its meanings is still applied to it. (My use of the 
unqualified word “statistics” in this paper is in the sense of theory and methods, 
not in the sense of statistical facts such as those found by the census.) Medical, 
biological and sociological considerations are prominent in the problems of vital 
statistics, and one of these departments might well handle the subject. But 
the vital statistician, like other research workers, should have acquired in the 
course of his trainmg an mtunate familiarity with the statistical theory and 
methods which are the appropriate province of a Department of Statistics. 
He also needs mathematics through integral equations, if he is to understand and 
extend the contributions of Lotka and Volterra. Students of vital statistics 
should have had an elementary course in statistical theory in the Department of 
Statistics, preferably the course requiring calculus. 

A course in price statistics should be taught by an economist, presumably in 
the Department of Economics, but might well require as a prerequisite the same 
elementary courses in statistical theory and methods as would be required in 
psychology, medicine and other fields. In addition, there are problems of time 
series analysis whose treatment calls for a mathematical statistician having some 
acquaintance with both economic and meteorological data. A course on the 
treatment of time series might appropriately be included in the Department of 
Statistics, requiring the general elementary course as a prerequisite, and itself 
serving as a prerequisite for courses in economic and meteorological statistics. 

One of the chief obstacles to efficient organization of teaching is the habit of 
not prescribing prerequisites outside one’s own department. But when once 
the elementary courses in statistics have become established in the hands of well- 


equipped specialists in statistical theory and methods, in whose competence 
general confidence can be reposed, the various departments of application will 
lose their motive for establishing their own duplicating courses, and will be able 
to cultivate more intensively their respective specialities. 


The detection of biases and the details of practical statistical work vary greatly 
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from one application to another. These, consequently, are matters for the de¬ 
partments concerned with applications rather than with the fundamentals of 
statistics, and should not be the chief features of a course in elementary statis¬ 
tical met^iods and theory. The work of a Department of Statistics should be 
concerned largely with sampling theory, and should emphasize the unity of 
statistical methods and theory, regardless of the field of application. It should 
deal with statistics as a coherent science of inductive inference, of the prepara¬ 
tion of observations for inference, and of the planning of investigations so as to 
yield observations from which inferences can best be made. 

The question what mathematical prerequisites should be established for the 
fundamental course in statistical theory must be answered by a compromise 
between the ideal and what is expedient at a particular time and place. In 
Europe a large number of students have had a year of calculus before coming to 
umversities, that is, before reaching the age of eighteen. If a university were 
willing to restrict its entrants to such students (thus automatically solving the 
problem of overcrowding) it could give them another year, of calculus, mixed 
perhaps with advanced algebra and geometry, and then in their sophomore year 
give them a thorough course in elementary statistics and probability, based on 
calculus. These students would then be ready to tackle advanced statistics in 
the third year in a really effective way. If the teaching of economic theory, 
physics, chemistry and astronomy were geared to this program in such a way as 
to make real use of the calculus, the work in these subjects could be made far 
more efficient, in the sense that more material could be covered effectively in 
the allotted time, or an equivalent amount of material in less time. If, in addi¬ 
tion, aU the many departments in which statistical methods and theory are used 
required these statistical courses as prerequisites, and actually used the mate¬ 
rials of these courses in their work, there would be a further huge gain in effi¬ 
ciency. The baccalaureate degree of such an institution would represent a far 
more thorough knowledge, and command of the tools of research, than is possible 
without an arrangement putting in this way the fundamentals first. 

Institutions unwilling to undertake such a drastic improvement must face 
more or less delay and inadequacy in the acquisition by their students of the 
fundamentals of mathematics and of statistics. A division of the students into 
groups according to mathematical ability ought to be undertaken, and followed 
by a corresponding division of the elementary statistics course. Students having 
high mathematical ability could begin the study of statistics after completing 
calculus, and could look forward to rising ultimately to greater heights in pur¬ 
suits involving mathematical or statistical knowledge than those of lesser mathe¬ 
matical talents. For these latter there would still be the possibility of acquir¬ 
ing, even without calculus, useful statistical tools; but it is essential that this 
should be done under the guidance of instructors thoroughly famihar with the 
mathematics of statistics. The task of leading the blind must not be turned 
over to the blind. Students possessing the ability to master the calculus should 
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be encouraged to begin the study of statistics with the course having calculus 
as a prerequisite, and should not be put into the necessarily slower group not 
having the calculus. I believe that these elementary courses should begin with 
the theory of probability, but should go on to the chief distribution functions 
used in practice, and should include applied problems and work on calculatmg 
machines. 

Putting a sound program of statistical teaching into effect will take time, 
partly because of the scarcity of suitable teachers of statistics. Nevertheless,' 
the process is well under way, and the prospects are good for substantial im¬ 
provements in the teaching of statistics. A body of able young research men 
possessing the requisite knowledge of statistical fundamentals is now in existence 
and is growing. Some of the recent textbooks represent striking improvements. 
The Institute of Mathematical Statistics itself, with the Annals of Mathematical 
Statistics, is perhaps the best evidence of a changed view making for better 
things. 

Columbia Univbrsitt, 

New York, N. Y. 


DISCUSSION OF PROFESSOR HOTELLING’S PAPER 

By W. Edwards Dbming 

It is a pleasure to endorse Professor Hotelling’s recommendations; in fact we 
have been following them pretty closely in the courses in the Graduate School 
of the Department of Agriculture. As a matter of fact, he has indirectly played 
an influential part in building up this set of courses, because some of our best 
instructors are his former students. 

Listemng to Professor Hotelling’s paper, I was t hinkin g of the possibility 
that some of hie recommendations might be misunderstood. I take it that they 
are not supposed to embody aU that there is in the teaching of statistics, because 
there are many other neglected phases that ought to be stressed. In the Bureau 
of the Census the population division alone has augmented its force by ap¬ 
proximately 3500 statistical clerks during the past six months. They come from 
^verse schools and it has been interesting to observe how many of them have the 
idea that aU the problems of sampling and inference from data can be solved by 
what are commonly known as modern statistical techniques—correlation co- 
eflicients, rank correlation coefficients, chi-square, analysis of variance, con¬ 
fidence limits, and the like. Most of them are shocked to learn that many of 
the so-called modem “theories of estimation” are not theories of estimation at 
all, but are rather theories of distribution and are a disappointment to one who is 
faced with the necessity of making a prediction from his data, i.e., of basing 
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some critical course of action on them. The conviction that such devices as 
confidence limits and Student’s t provide a basis for action regardless of the 
size of the sample whence they were computed, even under conditions of statis¬ 
tical control, is too common a fallacy On the other hand, many simple but 
worthy devices are neglected. A histogram, for instance, can be a genuine 
tool of prediction if it is built up layer by layer in different legends so as to dis- 
tinguish the different sources whence the data are derived. The modern student, 
and too often his teacher, overlook the fact that such a simple thing as a scatter 
diagram is a more important tool of prediction than the correlation coefficient, 
especially if the points are labeled so as to distinguish the different sources of the 
data. Most students do not realize that for purposes of prediction the con¬ 
sistency or lack of it between many small samples may be much more valuable 
than any probability calculations that can be made from them or from the entire 
lot. Students are not usually admonished against grouping data from heterog¬ 
eneous sources. Of those that are not guilty of indisenminate grouping, many 
are inclined to rely on statistical teats for distinguishing heterogeneity, rather 
than on a careful consideration of the sources of the data. Too little attention 
is given to the need for statistical control, or to put it more pertinently, since 
statistical control (randomness) is so rarely found, too little attention is given 
to the interpretation of data that arise from conditions not in statistical control. 

Nevertheless, the fundamentals of probability and sampling theory, and the 
mathematics of the distribution functions, though by themselves they do not 
qualify anyone for high-grade statistical work, are ultimately essential for pro¬ 
ficiency in statistics. Since they are seldom learned away from the university 
they are properly made the main theme of teaching. The university is the 
place to learn the studies that are so difficult to get outside of it. 

Above all, a statistician must be a scientist. The skepticism of many first 
class scientists of today for modern statistical methods should be a challenge to 
statistical teaching. A scientist does not neglect any pertinent information, 
yet students of statistics are often taught to do just the opposite of this, and are 
accused of being old-fashioned for daring to think of combimng experience with 
the new information provided by a sample, even if it is a pitifully smaO one 
Statisticians must be trained to do more than to feed numbers mto the null and 
grind out profcabilitiesj they must look carefully at the data, and take account 
of the conditions under which each observation arises. It is my feebng that 
the chief duty of a statistician is to help design experiments in such a way 
that they provide the maximum knowledge for purposes of prediction; another 
is to compile data with the same object in view; and still a third function is 
to help bring about some changes in the source of the data. Scientific data 
are not taken merely for inventory purposes. There is no use taking data if 
you don’t intend to do something about the sources whence they arise. 

BtTREATT OF THE CeNSTTB, 

Wabhinoton 
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EESOLUTIONS ON THE TEACHING OF STATISTICS 

The Institute of Mathematical Statistics at its business meeting on September 
11, 1940 at Dartmouth College adopted the following resolutions regarding the 
teaching of statistics. The resolutions were drawn up by a committee appointed 
by the President, and consisting of Burton H. Camp, W. Edwards Deming, 
Harold Hotelling, and Jerzy Neyman. 

1. If the teaching of statistical theory and methods is to be satisfactory, it 
should be in the hands of persons who have made comprehensive studies of the 
mathematical theory of statistics, and who have been in active contact with 
applications in one or more fields. 

2. The judgment of the adequacy of a teacher’s knowledge of statistical 
theory must rest initially on his published contributions to statistical theory, in 
contrast with mere applications, in a manner analogous to that long accepted in 
other university subjects. 

3. These ideas are expressed in detail in the paper The teachitig of statistics, 
by Professor Harold Hotelling, and the Institute decides to give both the 
resolution and the paper as wide a circulation as possible. 



report of the HANOVER MEETING OF THE INSTITUTE 

The sixth meeting of the Institute of Mathematical Statistics was held at 
Dartmoutii College, Hanover, New Hamphire, Tuesday to Thursday, Sep¬ 
tember 10 to 12, 1940, in conjunction with meetings of the American Mathe¬ 
matical Society and of the Mathematical Association of America. The fol¬ 
lowing forty-two members of the Institute attended the meeting: 


II, E. Arnold, Felix Bernstein, G. W. Brown, J. H. Bushey, B. H. Camp, A. T. Craig, 
A, E. Crathorne, J. H, Curtiss, J. P, Daly, W. E. Deming, J L Doob, ChurchillBisenhart, 
11, L. Elvebaok, C, H. Fisoher, M. M. Flood, R M. Foster, T C. Fry, H. P. Geiringer, 
jlobort Henderson, E. H. C. Hildebrandt, G. M. Hopper, Harold Hotelling, E. V. Hunting- 
ton, M. H. Ingraham, Dunham Jackson, W. L. Kichlme, L. F, Knudsen, B. A. Lengyel, 
IV, G. Madow, J. W. Mauchly, Richard von Mises, E. B. Mode, Jerzy Neyman, P. S. Olm- 
itwd, Oystein Ore, M. M. Sandomire, L. W. Shaw, F F Stephan, A. G Swanson, Abra¬ 
ham Wald, 8 . S. Wilks, Jacob Wolfowitz, 


The meeting of the Institute consisted of four sessions. At the first session, 
which was held on Tuesday morning. Professor Harold Hotellmg of Columbia 
Univeraity delivered an address on The Teaching of Statistics. This address 
was followed by considerable discussion on the various aspects of the teaching 
of statistics.^ Preceding Professor Hotelling’s hddresa a short paper on an 
Empirical Comparison of the “Smooth" test for goodness of fit vnlh Pearson’s 
Chi'SquaTe lest was presented by Professor J. Neyman of the University of 


California, , i,. t t 

following Professor HoteUing’s address a business meeting of the Institute 

was held. At this time resolutions on the teaching of statistics were approved 
(see p. 472). The President reported that a War Preparedness Committee 
kd been appointed in the summer to study the matter of the Institute apar- 
ticipation in the national defense program * The Chairman of this Committee 
^bhiitted a preliminary report which, met the approval 
plan was approved for completing the report and circularizing it with a minimum 

of the organization of local sections or chapters of the Institute 
was discussed but no action was taken. 


. P,.rc».r IWUns’. .»d tt™ 'T'tt" 

published m the present issue of the Anno / 

‘ The membership of the Committee le Wisconsin. 

Profewor Churchill Bisenhart (Chairman)„TJmversity 

Professor A. T, Craig, Univemity 

Professor E. G. Olds. Carnegie 

Captain UsHe B. Simon, Aberdeen 

Mr. Ralph E. Wareham, General Eleotno Company. 
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On Tuesday afternoon a session on contributed papers in Mathematical 
Statistics was held jointly with the American Mathematical Society, Pro¬ 
fessor B. H. Camp of Wesleyan University presided and the following papers 
were presented. 

1. Conlributiona to the theory of the representative method of sampling. 

Dr W G Madow, Department of Agriculture, Washington. 

2. A generalization of the law of large numbers. 

Dr Hilda P. Geiringer, Bryn Mawr College. 

3. On the problem of two samples from normal populations with unequal variances. 

Professor S S. Wilks, Princeton University 

4. Experimental determination of the maximum of an empirical function 

Professor Harold Hotelling, Columbia University. ' 

5. Asymptotically shortest confidence intervals 

Dr, Abraham Wald, Columbia University 

6. Reduction of certain composite statistical hypotheses. 

Dr G. W Brown, R. H. Maoy and Company, Ino., New York. 

7 Conception of equivalence in the limit of tests and its application to certain X and x’ 

tests. 

Professor J. Neyman, University of California 
Abstracts of these papers follow this report 

On Wednesday morning a session was held on The Theory of Probability 
with Dr. T. C. Fry of the Bell Telephone Laboratories, in the chair The 
following addresses were given: 

1. On the foundations of probability theory. 

Professor R. von Mises, Harvard University. 

2. Probability as measure. 

Professor J. L. Doob, University of Illinois. 

This session was followed by an energetic discussion which was continued in an 
informal afternoon session. 

The Thursday morning session was devoted to the Theory of Statistical Esti¬ 
mation with Professor Harold Hotelling as Chairman. The following addresses 
were given: 

1. Estimation by intervals as a classical problem in probability. 

Professor J Neyman, The University of California. 

2. Statistical estimation in large samples Dr. Joseph P. Daly, The Catholic Univer¬ 
sity of America. 

On Monday at 4:15 p.m. a tea was held at the Graduate Club for members 
of the mathematical organizations and their guests, and on Monday at 8:00 a 
musical performance was piesented. On Tuesday at 7:00 p.m. a joint dinner 
was held for the mathematical organizations in Thayer Hall. Wednesday 
afternoon was devoted to an excursion to Franconia Notch. 

During the meeting a collection of string models of ruled surfaces was ex¬ 
hibited by Professor Robin Robinson of Dartmouth College and electrical 
calculation apparatus made from telephone equipment was exhibited by mem¬ 
bers of the staff of the Bell Telephone Laboratories. 
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Contributions to the Theory of the Representative Method of Sampling. 
WiLLUM G. Madow, Washington, D. C 


The theory of representative sampling may be regarded as a dual sampling process, the 
first of which consists in the sampling of different random variables and the second of which 
consists in repeating several times the experiments associated with each of the different 
random variables. It follows that while the theory of sampling from finite populations 
without replacement may be required for the first process, the second leads directly into 
the theory of sampling from infinite populations. There is, however, one difference. 
Although the usual theory is concerned with the evaluation of fiducial or confidence limits 
for parameters the theory of sampling is concerned with the evaluation of fiducial or confi¬ 
dence limits for, say, the mean of a sample of N, when n, {N > n), of the values are known. 

It IB thus possible to use the usual theories of estimation in obtaining estimates of the 
parameters and to allow the effects of subsampling process to show themselves in the 
different values of the fiducial limits It is shown that the limits obtained are almost 
identical with those obtained by the theory of sampling from a finite population. Distri¬ 
butions of the statistics used in these limits are derived 

Besides these results, the theory is extended to the theory of sampling vectofs, and condi¬ 
tions are stated under which the "best” allocation of the number in a sample among several 
strata is proportional to the fcth roots of the generalized variance of a random vector 
having k components 

A Generalization of the Law of Large Numbers. Hilda Geibingeb, Bryn 


Mawr. 

Let y.fil Vifi). • • ■ , 7„(a:) be n probability distributions which are not supposed to 
be independent and let , ••• , x.) be a "statistical function” of n observation 

in the sense of v. Mi8es,-7.(x) (i = 1, 2, ••• n) indicating as usual the probability of 
getting a result ^ x at the fth observation-. Then it can be proved th&t under fairly 
general conditions Fix,, x,, •••,*„) converges stochashcally toward its 
value”, or m other words, that under these general conditions o great class o/ statistics 
Fix, , X,, .. • , x«) 18 "consistent" in the sense of R. A. Fisher. 

Well known particular oases of this theorem result if (o) we take for F(xi, x ,, - ■ , „ 
the areraffc (x. -h x, + ■ • + x,)/n of the n observations, (b) we assume that the 7.(x) 

are independent distributions. , 

On the Problem of Two Samples from Normal Populations with Unequal Van- 
ances. S. S. Wilks, Princeton University. 

Suppose and are samples of n. and m elements from 

t It i. therefor, impoeribl. to obtem .».l e.nM.»ee bmrt. 
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for oi — Oj corresponding to a given confidence coefficient. Punctions of the four parame¬ 
ters and four statistics are devised from which one can set up confidence limits for at — a, 
with associated confidence coefficient inequalities. 

Experimental Determination of the M axim u m of an Empirical Function. 
Harold Hotelling, Columbia University. 

In physical and economic experimentation to determine the maximum of an unknown 
function, for example of a monopolist's profit as a function of price, or of the magnetic 
permeability of an alloy as a function of its romposition, the characteristic procedure is to 
perform experiments with chosen values of the argument x, each of which then yields an 
observation, subject to error, on the corresponding functional value p = /(a:). The values 
of X need, however, to be chosen on the basis of earlier experiments in order to make the 
determination efficient. The expenmentationjiroperly proceeds, therefore, in successive 
stages, with the values used at each stage determined with the help of the earlier work. 
The question what distribution of x as a function of previous results should be used is 
discussed in this paper on the basis of various hypotheses regarding the function, and 
further criteria. In particular, a conflict is shown to exist under some conditions between 
the criterion of minimum sampling variance and that calling for absence of bias 


Asymptotically Shortest Confidence Intervals. Abraham Wald, Columbia 
University. 


Let /(x, 9) be the probability density function of a variate x involving an unknown 
parameter 6, Denote by Xi, , x„n independent observations^on x and let C„(9) be a 


positive function of 9 such that the probability that 




< Cn(9) 


is equal to a,constant under the assumption that 9 is the true value of the parameter 

1 3 

Denote by 9'(xi, • • • , «„) the root in 9 of the equation - 7 = — X log /(x«, 9) - C„(9) 

y/n 09 a 


1 d 

and by 9"(x,, • ■ , x„) the root of — 7 ^ — T' log/(x., 9) «=■ —C„(9). Under some weak 

Vtt 39 a 

assumptions on /(x, 9) the interval i„(xi, • • , x„) - [9'(xi, ■ • • , x„), e"(xi, • ■ • , x,)] 
is in the limit with n —> « a shortest unbiased confidence interval^ of 9 corresponding to 
the confidence coefficient This confidence interval is identical with that given by S. S. 
Wilks in his paper "Shortest average confidence intervals from large samples," The Annali 
of Mathemahcal Stahslicsr Sept. 1938. Wilks has shown that 9n(xj, ■ ■ •, x„) is asymptot¬ 
ically shortest in the average compared with all confidence intervals computed on the 
basis of statistics belonging to a certain class C, In the present paper it has been proved 
that the confidence interval in question is asymptotically Shortest compared with any 
arbitrary unbiased confidence interval, without any restriction to a certain class of 
functions. 


Reduction of Certain Composite Statistical Hypotheses. George W. Brown, 
R. H. Macy and Co., New York. 

The results obtained make it possible to reduce a large class of composite statistical 
hypotheses to equivalent simple hypotheses The fundamental theorem established states 
essentially that if two distributions give rise, in sampling, to the same distribution of the 


1 For the definition of a shortest unbiased confidence interval eee the paper by J. Ney- 
man, "Outline of a theory of statistical estimation based on the classical theory of proba¬ 
bility," Ph%l Trans Boy. Soc, (1937). 
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get of differences between observations, then one distribution must be a translation of the 
other, Bubject to a condition requiring that the characteristic function of one of the distri¬ 
butions be Buch that any interior intervals of zeros be not too large. The result is estab¬ 
lished by meane of the functional equation - i,) = - t,) 

relating the characteristic functions. Similar results are obtained for scale, and com¬ 
bination of location and scale, and the correeponding situations in multivariate distribu¬ 
tions. This type of uniqueness theorem permits one to reduce a composite hypothesis 
involving an unknown location parameter (or scale, or both) to an equivalent simple 
hypothesis. 

Conception of Equivalence in the Limit of Tests and Its Application to Certain 
and x'^-Tests. J. Neyman, University of California. 

Denote by E a system of observable variables and by N the number of independent 
observations of those variables to be used for testing a certain statistical hypothesis ff 
against a set fl of admissible simple hypotheses h. Let further J’i(lV) and TiCAT) be two 
different testa of H using the same number N of observations. Consider the probability 
P/fCW calculated on any admissible simple hypothesis h, of the two tests, contradicting 
themselves. 

Definition : If, whatever be h e 11, the probability Pif(h) tends to zero as N is indefinitely 
increased, then the two tests are said to be equivalent in the limit. 

Consider a number a of series of independent trials and denote by Bn , E,t, ■■■, E,mi 
all the mi possible and mutually exclusive outcomes of each of the trials forming the ith 
senes. Let pi/ be the probability of E,t, ni the total number of trials in the tth series, 
and nn the number of these which give the outcome E.j. 

Suppose that it is desired to test a composite hypotheeis H ooncerning all the proba¬ 
bilities pii and consisting of the assumption that any one of them is a given linear function 
of some (independent parameters 0 i ,, so that 


(1) pii “ aiii -t- oiiiOi -t- ••• -h aiii$i 

where the coefficients ai/t are known. The main result of the paper is then that the N-teet 
of the above hypothesie If, tested against the set P of alternatives ascribing to the p»/ 
any non-negative values, is equivalent in the limit to the test coneisting of rejecting H 
when the minimum of the expression 


( 2 ) 


(riij — Tupu)* 


X* - Z S 


calculated with respect to unrestricted variation of the ff’e, exceeds the tabled value of x, 
correeponding to the chosen level of eignificanoe « and to the number of degrees of freedom 

ji 

T'.mi — » — t. 

It will be noticed that the expreeeion (2) differs from the usual x* m the denominator 


of each term. ., • u w 

As an example of the application of the test based on (2), coMider the case where M 
varieties of sugar beet are tested for resistance to a certain disease in an experiment 
arranged in N randomized blocks. Denote by n the number of beete selected at random 
for inspection from each plot and by mi the number of those of the »th variety from the 
plot in the fth block which are fbund to be infected. Denote further by p., the Proportion 
of infected beets of the tth variety in the plot in the jth block. The hypotheaw that the 
effecte of variety and of block are additive is ejtprossed by pit - p + Vi + O, witn 
2Ui - 2 iB{ m 0. To test this hypotheeie we may uee (2) which in this particular case 

reduces itself to 
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(3) " 2 2 "" p “ 

,„1 jml 

with Wti = n^/lfiijin - Wi^)), ju = nu/n- The minimum xo of x* >8 found by solving a 
set of equations which are linear in p, 7, , 5/ and the comparison of xl with the tabled 
value corresponding to (Jlf - l)(iV ~ 1) degrees of freedom will tell us whether we are 
likely to be very wrong in assuming additivity or not, In the favorable case we may 
next proceed similarly to teat another hypothesis that there is no differentiation between 
the varieties, so that yi = 7i= •• = 7jif«0 

Empirical Comparison of the “Smooth” Test for Goodness of Fit with the 
Pearson's x'' Test. J, Neyman, University of California. 

I 

In a previous publication* the author has deduced a teat for goodness of fit, described 
as the "smooth test" or the ^ test, applicable to cases where the hypothesis tested H 
IS simple The test is so devised as to be particularly sensitive to departures from H 
which are "smooth" in the sense explained in detail in the publication quoted. Whether 
the test BO devised does present any advantage over the usual x* test depends on how 
frequently we meet, in practice, cases where the hypotheses alternative to the one tested 
are actually smooth 

The present investigation was undertaken with the object of obtaining some information 
on this point. For that purpose a number of'cases described in the literature where there 
was a question of testing that some observable variable x follows some perfectly specified 
distribution p(*) were analyzed. Of all such cases, the ones where there were o ■priori 
theoretical reasons to believe that p(®) could not possibly represent the true distribution 
of X and, at the most, it could be considered as only an approximation to the true distri¬ 
bution were selected 

It was assumed that the departures from the hypothetical distributions are typical of 
those that may be met in practice when no definite information as to the actual state of 
affairs is available, The hypothesis of goodness of fit was tested both by means of the 
X* and by the fourth order smooth test. Out of the 130 cases studied the two tests were 
in perfect agreement eight times Out of the remaining 122 cases the smooth test proved 
to be more sensitive than the x* m 70 cases and the x* better than the smooth test in 62 
cases, We may further compare the tests by counting those cases where one of them 
detected the falsehood of the hypothesis tested at a given level of significance while the 
other failed to do so. At the level of significance .05 the x* test rejected the hypothesis 
tested 13 times, while was >.05. The reverse was true in 17 cases, At the level of 
significance .01 the corresponding figures are 5 and 14, again in favor of the smooth test, 


* J. Neyman, " 'Smooth Test’ for Goodness of Fit," Shan^navish Aktuarietidshrift, 
1937, pp. 149-199. 



REPORT OF THE WAR PREPAREDNESS COMMITTEE OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The generally recognized functions of a staUsiician are the calculation of 
averages, percentages, and index numbers; the construction of bar graphs and 
pie diagrams; and the compilation of data in general. His other activities 
are less widely known. In particular, the recent advances in mathematical sta¬ 
tistics are known to a relatively small proportion of the persons occupying 
responsible positions in academic life, in industry, and in government, The 
mathematical statistician, in fact, is concerned chiefly with the interpretation 
of data through the use of probability theory; bis is the science of reasoning 
from a part to the whole, and of prediction; and to him falls the task of stating 
the conditions under which such inferences are possible, of devising means of 
testing whether these conditions are satisfied, and of evaluating the prob¬ 
ability that such ‘uncertain inferences’ are correct in specific instances. Fur¬ 
thermore, it is his responsibility to so plan the lay-out of experiments and the 
conduct of surveys that the data they yield will contain the maximum informa¬ 
tion on the points at issue and be amenable to unambiguous statistical 
interpretation. 

Because of the functions which the mathematical statistician can perform his 
services should be of value to the National Defense Program in the following 
fields: 


I. Quality Control and Specification. The functions of a mathematical 
statistical nature connected with quality control and specification of articles 
produced by mass production are: 

(1) Tests of randomness. These are important because statistical methods 

of inference are strictly valid only for random samples. 

(2) The use of probahlity theory %n predicting the outcome of future repetitions 
of du operation which is %n a state of statistical cotiiroV The evaluation ^ ® 
probability that the quality of a piece of product will he withm any previously 
specified tolerance limits as long as a state of statistical control is maintained, 
and the development of sampling inspection techniques are examples of t s 
function. 


1 A repetitive operation, euch as a production proceBB, Ib swd to be m a stale 
control when it produces a sequence of obBervations which 
ness. An important aspect of quality control is the 
as the result of an effort to reduce a manufacturing proceBS f 

Furthermore, when this state of control is attained it is p«sible f Bem a eduction m 
cost of inspection, a reduction in cost of rejections, a reduction in 
quality measurement is indirect, and the attainment of uniform quality even though the 

inspection test is destructive. 
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(3) Representative sampling. When a repetitive operation such as a produc¬ 
tion process is not in a state of statistical control, it is not possible to make 
valid inferences about the quality of a lot from an examination of a sample 
from the lot unless the sampling process is one of random selection within 
"strata” in accordance with the principles of representative sampling. 

(4) Analysis of variance. Reference is made here to the technique whereby 
the total variability of a product of an operation which is in a state of statis¬ 
tical control can be decomposed into components associated with the various 
sub-operations involved. 

(5) Correlation methods. When a direct measurement of quality is extremely 
costly, it is sometimes advisable to use as an indirect measurement of quality 
the value of some character less costly to measure which is highly correlated 
with quality, 

(6) Specification of quality as a variable. Statistical theory, including tests 
for randomness, must be taken into account in writing quality specifications if 
the consumer is to be protected against the vagaries of sampling and the pro¬ 
ducer safeguarded from the incurring of penalties of an unjust chance. 

II. Sampling Surveys. The importance of conducting sampling surveys 
in accordance with the principles of representative sampling is well established. 
It is quite possible that such surveys and partial censuses will be needed in 
connection with the National Defense Program in order to determine the 
frequency and location of individuals possessing special traits, e.g. persons 
capable of withstanding the rigours of dive bombing, or persons possessing 
types of color blindness which render them valuable as observers who can 
detect camouflage, etc. The “problem of sizes” connected with Stores and 
Supplies—see below—may require careful preliminary surveys. Also, surveys 
may be needed to evaluate the effects of various types of propaganda. 

III. Experimentation of Various Bonds. The mathematical statistician 
can be of service in connection with experimentation of various kinds under¬ 
taken as a part of the National Defense Program since the following aspects 
of experimentation are of a mathematical statistical nature: 

(1) Randomization. Since statistical tests for the existence of differences 
between samples, of correlation, etc. are strictly valid only for random samples, 
the operation of randomization is of paramount importance in "the comparison 
of new designs, new materials or alloys, study of contact phenomena under 
different conditions, corrosion of materials under different atmospheric con¬ 
ditions, and field trial of equipment, to mention only a few.” If randomization 
is not undertaken, observed differences between designs, for instance, may have 
arisen from non-random assignable differences in the material presented. Fur¬ 
thermore, the validity of tests for significant differences between the effects 
of various designs rests upon the condition that the variability observed in 
■the effects of each design be of random character and free from trends and 
non-random shifts in magnitude—i.e. the operation of determining the effects 
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of each design must be in a state of statistical control, to use a phrase employed 
in quality control 

(2) Experimental design. Without careful attention to the lay-out of an 
experiment, the data it yields may be difficult and even impossible to interpret. 
Therefore, the principles of experimental design set forth by R. A. Fisher ^nd 
his followers are of great importance, as are also the special experimental ar¬ 
rangements which have been devised to cope with many of the more usual 
difficulties met in practice. 

IV. Personnel Selection. The allocation of individuals to places where 
they can be of greatest value in the National Defense Program will undoubt¬ 
edly require tests for mental and physical traits. Although the development 
and analysis of such testa is largely in the hands of psychometric groups, the 
use of methods of multivariate statistical analysis in such work renders this 
field one in which mathematical statistics ought to play an important role. 


It is in the above four fields that there is special need for the training and 
endowments of the mathematical statistician. He can also render valuable 
assistance in the following fields: 


V. Stores and Supplies. .. 

(1) Problem of sizes. Preliminary surveys are likely to prove useful in 
ascertaining the relative frequencies of demand for the respective sizes of cloth¬ 
ing, etc. in different parts of the country, 

(2) Development of procedures for charting the day Ut day location and move¬ 
ment of stores and supplies. . . 

(3) Problem of replacement of parts and equipment. In many it is more ec^ 
nomical to make replacement at statisticaUy determined times, than to wait 
for complete failure. 


VI. Transportation and Communication. Probability theory has shown 
its usefulness in peace time in handUng “traffic” problems that arise m telephone 
and telegraph communication, electric power distnbution etc. No doubt it 
will find corresponding application to problems in these fields ansing out of the 
National Defense Program. 

' VII. Gunnery and Bombing. Although there is a need in ^ 

artniery fire fo7further development of methods of estimating standard devia¬ 
tions £)m successive differences in order to minimize the biases 
slowly changing conditions during the period of fimg, the 
ie are quite ffimly established and the relatively new science of bombing « 

likelv to present greater opportunities for the apphoation of the methods of 
likely to present greater opp evaluating bombing techmques 

mathematical statistics. For instanc , biftaes from the 

Ih,™ is need of etolistical method, in eepnmting the oonetot hia.e8 from tne 

random variability. 
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VIII. Meteorology. The extent to which statistical methods are being 
employed in meteorology can be seen from an examination of the Monthly 
Weather Review Supplement No. 39j issued April 1940, and entitled '“Reports 
on Critical Studies of Methods of Long-Range Weather Forecasting." There 
seems to be excellent opportunity here for the application of methods of multi¬ 
variate analysis and for the development and uses of methods applieable to 
serially correlated data. Such work would be of value In National Defense 
so far as it would enable the forecasting of conditions suitable for launching an 
attack. 

IX. Medicine. The National Defense Program will probably require the pre¬ 
paration and storage of hormone substances, toxic compounds, drugs, and other 
medicinal supplies. Since many such are examined for potency, toxicity, etc. 
by means of animal assays, there will be considerable opportunity here for 
the sound application of mathematical statistics in planning and interpreting 
these bioassays. 

In nearly all of the above activities the application of mathematical statistics 
is likely to encounter two major difficulties: 

(1) Obtaining an adequate trial of the methods of mathematical statistics. 

(2) Suppl 3 ring persons to occupy key positions in the application of mathe¬ 
matical statistics in a given field—persons competent in mathematical statis¬ 
tics and who possess a sound background in the field of application. 

In some of the above activities, e.g. Quality Control, there will be the further 
difficulty of 

(3) Supplying the vast number of slightly trained workers who will gather 
the data and perform the analyses. 

It is with these difficulties in mind that the Committee recommends that the 
Institute 

(1) Prepare a register of Institute members, stating for each member his 
background, interests, and experience so far as these relate to mathematical 
statistics and its applications 

(2) Appoint a committee to handle inquiries concerning personnel qualified 
to deal with particular projects; 

(3) Cooperate to the fullest extent in matters pertaining to quality control 
and specification with the Joint Committee for the Development of Statistical 
Applications in Engineering and Manufacturing, of which the Institute is a 
sponsor * 


’ The preparabion of this register should be coordinated with any similar undertaking 
sponsored by the National Rosier of Scientific and Specialized Personnel, National Re¬ 
sources Planning Board, Executive Office of the President, Washington, D. C. 

• We suggest the following as possible undertakings in a cooperative program with the 
Joint Committee. 

.(1) Requesting statements regarding the potential contribution to National Defense 
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(4) Undertake such steps as are feasible which will lead to cooperation with 
other organizations having interests similar to those of the Institute, e.g. the 
American Statistical Association, the Psychometric Society, and the Econo¬ 
metric Society. 

(6) Establish contact with the National Defense Research Committee headed 
by Dr. Vannemar Bush and coordinate the Institute’s activities with those 
of this national Committee. 


In conclusion, we feel that as an organized group the Institute’s primary 
function in relation to the National Defense Program should be to serve as a 
reservoir of specialists, experienced in the use of the methods of mathematical 
statistics, who can direct the use of these methods and be of assistance in the 
development of new techniques as needed. As a secondary, but equally im¬ 
portant function, the Institute is in a position to supervise, and perhaps to 
undertake through the activities of its individual members, the training in 
mathematical statistics of the individuals who will be needed in the application 
of whatever statistical programs of the type noted above are undertaken in 
connection with the National Defense Program. It is recommended, therefore, 
that the Institute’s interest in the above activities, and its willingness to be called 
upon, be adequately publicized, possibly by sending copies of this report to various 
members of the Government, such as the Chief Signal Officer and the Coordma- 


of statistical methods in quality control and specification from men prominent in industry 
who are familiar with recent developments m quality control. 

be asked to give, where possible, concrete evidence of the value of such methods in their 
experience-tvidence which would be helpful in securing authoritative acceptance o 

UmuB oenlm (C.pt.m' of our Co™».. » P-P*™! 

" m Tho ,rr»g.»o»» oHocd ju^tios. 

versities in a few large industrial cent . • j uVn-h men in local industries who 

serve as chairman. To such a meeting wou , ,. . ^ j^gthodB to their problems, and 

quality control. 
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tor of National Defense Purchases and also to the secretaries of appropriate 
organizations, such as the American Standards Association, with the request 
that they advise the Institute of any specific action they feel the Institute 
should take. 

, A. T. Cbaiq 

E. G. Olds 
L. E. Simon 
R. E. Warbham 
0 . Eibenhabt, Chairman, 





